I.R.I.S. Working Group

Interoperable Multimedia Retrieval in Distributed Systems

MPEG Query Format (MPQF) - Part 12 of MPEG-7


Essentially, the MPEG Query Format (MPQF) is an XML-based query language that defines the format of the queries and replies exchanged between clients and servers in a distributed multimedia search and retrieval system. The two main benefits of standardization of such a language are 1) interoperability between parties in a distributed scenario (e.g. content providers, aggregators and clients) and 2) platform independence (which also offers benefits for non-distributed scenarios). The result is that developers can construct applications exploiting multimedia queries independent of the multimedia service used; this fosters software reusability and maintainability. As a technical specification from MPEG (Moving Picture Experts Group, http://mpeg.chiariglione.org), this initiative is an international, open standard targeting all application domains. One of the key features of MPQF is that it allows the expression of multimedia queries combining both the expressive style of information and XML Data Retrieval systems. Thus, MPQF combines e.g. keywords and query-by-example with e.g. XQuery allowing the fulfillment of a broad range of users’ multimedia information requirements.

Mpqf Diagram

MPQF defines a request-reply XML-based interface between a requester and a responder. In the simplest scenario (see above picture), the requester may be a client and the responder might be a multimedia retrieval system. However, MPQF has been specially designed for more complex scenarios (like A.I.R. -> https://www.dimis.fim.uni-passau.de/iris/index.php?view=air), in which clients interact, for instance, with a content aggregator. The content aggregator acts at the same time as both a responder (from the point-of-view of the client) and requester to a number of underlying content providers to which the client query is forwarded.

Reference Software of MPQF

Reference software implemented by our group can be obtained at the following link: Reference Software. This tool comes with a parser and a validator for the MPQF standard.
Reference software for the novel semantic enhancement can be obtained at the following link: Semantic Enhancement. A scientific description of the translating process and the use of the software is given here

Parts of MPQF

A. Input Query Format

The part of MPQF describing the contents of the Input element is named the Input Query Format (IQF, see Figure 4). The IQF defines the syntax of messages sent by the client to the multimedia service, which specify the client’s search criteria. The two main components of the IQF allow specification of a filter condition tree (by using the QueryCondition element) and definition of the structure and desired content of the service output (by the OutputDescription element) respectively. The IQF also allows declaration of reusable resource definitions and metadata paths (fields) within the QFDeclaration element, and the set of services where the query should be evaluated (the ServiceSelection element) in the case of communication with an aggregation service.

1) QFDeclaration: The optional QFDeclaration element allows declaration of reusable definitions of data paths and resources that can be referenced multiple times within a query. A data path is declared by a DeclaredField element which contains a relative or absolute XPath expression pointing to an item of the service’s metadata. A Resource element operates as container for one of the following components: <

  • MediaResource describes a resource by containing or pointing to the raw multimedia material
  • DescriptionResource specifies the container for any description based on a specific schema specified by the namespace declaration within the description
The introduction of the QFDeclaration element allows the size of the query to be significantly reduced if the same elements appear multiple times in one query.

2) Output Description: The OutputDescription element describes the structure and content of the individual result items within the result set. Furthermore, it allows (by two optional attributes) the definition of an overall item count and the maximum number of items per output page. The occurrence of the second attribute indicates that paging of the result set is desired.
In the following, the main features of the OutputDescription element are introduced:
  • ReqField describes the data path to the element, which a client asks to be returned. Paths are either specified using absolute XPath expressions, which refer to the root of the description or by using relative XPath expressions referring to a given schema’s complex type.
  • ReqAggregateID describes the unique identifier of the aggregate operation the client asks to be returned. When one or more ReqAggregateIDs are used, the aggregate ID should be grouped.
  • GroupBy describes the grouping operation the user wants to apply on the result set. For this purpose, the GroupBy element allows the specification of several GroupByField elements, which describes the key for the grouping process and according aggregation operations.
  • SortBy describes the sort operation the client wants to apply on a result set. The result set can be sorted in increasing or decreasing order according to a given Field element or an aggregate expression. The Field element contains an absolute or relative XPath expression.
3) Query Condition: The query condition supports a client expressing filter criteria for a retrieval. The main entry point for formulating a filter criteria is the QueryCondition element (see Figure 4) which features an optional EvaluationPath element, an unbounded number of TargetMediaType elements and a choice between a Join and a Condition element. The EvaluationPath (a XPath expression) element specifies the granularity of the retrieval and therefore the node of the metadata fragment related to the evaluation item. For instance in the case of MPEG-7, the EvaluationPath //Video would focus the retrieval to whole video objects whereas the EvaluationPath //VideoSegment would lead to the more specific video segment objects. The TargetMediaType element contains MIME type descriptions of media formats that are the targets for retrieval. For instance, the MIME type audio/mp3 would filter all results for audio files depending on the MP3 format. Further, diversity in filter criteria is provided by the Condition element. The Condition element is a placeholder for a boolean expression type (see Figure 5 for its type hierarchy) and may result in an n-ary filter tree. As outlined in Figure 5, the filter tree can be established by three main constructs, namely query types, comparison expressions and boolean operators. In general, instances of query types always represent leaf nodes within the filter tree. Nodes of type comparison expression can occur as inner nodes and leaf nodes and finally, nodes that represent boolean operators are always inner nodes. In order to indicate the importance of individual parts during the retrieval process, one can assign a preference value to every boolean expression element, and hence to each node within the filter tree.

a) Comparison Expression: In MPQF, a comparison expression is defined as common by the following term: A op B, where A,B 2 OperandClass and op 2 {<, >,= ,,, ! =, }. The symbol  defines a contains operation for strings. A OperandClass denotes a representation of a specific data type, such as Boolean, String, Arithmetic, DateTime or Duration. Note, that both operands (A, B) must belong to the same OperandClass within a comparison expression. Every element of an OperandClass can be described by (1) a value of the data type, (2) a XPath expression pointing to a value of the specific data type or (3) a corresponding expression (e.g., String expression, Arithmetic expression, etc.) resulting to a value of the specific data type.
String, Arithmetic and Boolean expressions are similarly defined as comparison expressions but restrict their operands A, B to the corresponding data type (e.g., String operands for String expression, Arithmetic operands for Arithmetic expressions, etc.).
The following short example (see Code 1) demonstrates the use of a comparison and arithmetic expression for MPEG-7 documents. By reverting to the previous term definition: A op B, then the example instantiates for op an Equal comparison operation, for the operand A an arithmetic expression and for B an arithmetic value (value 0). In series, the arithmetic expression is composed of the operation (op) Modulus, the operand A is a relative XPath expression pointing to an arithmetic value (totalNumOfSamples attribute of the AudioLLDVectorType type) and operand B again is an arithmetic value (value 2). During an evaluation process, this filter condition would result in documents that have an even number in the totalNumOfSamples attribute of the AudioLLDVectorType type.


b) Query Types: The current MPQF standard provides the following set of query types:
  • QueryByMedia specifies a similarity or exact-match query by example retrieval where the example media can be an image, video, audio or text.
  • QueryByDescription specifies a similarity or exact-match query by example retrieval where the example media is presented by any XML based multimedia description format (e.g., MPEG-7).
  • QueryByFreeText specifies a free text retrieval where optionally the focused or to be ignored fields can be declared.
  • QueryByFeatureRange specifies a range retrieval for e.g., low level features like color.
  • SpatialQuery specifies the retrieval of spatial elements within media objects (e.g., a tree in an image) which may be connected by a specific spatial relation.
  • TemporalQuery specifies the retrieval of temporal elements within media objects (e.g., a scene in a video) which may be connected by a specific temporal relation.
  • QueryByXQuery specifies a container for limited XQuery expressions.
  • QueryByRelevanceFeedback specifies a retrieval that takes into consideration result items of a previous search as bad and/or good examples.
  • QueryByROI specifies a retrieval on (spatial/temporal) region of interest in media resources.

B. Output Query Format

The Output Query Format (OQF) deals with the specification of a standardized format for multimedia query responses (see Output element in Figure 1). The two main components cover paging functionality (see subsection IV-B1) and the definition of individual result items (see subsection IV-B2). Besides, the OQF provides a means for communicating global comments (by the GlobalComment element) and status or error information (by the SystemMessage element). Using a global comment, the responder can send general messages such as, the service subscription expiration notice or a message from a sponsor which is valid for the whole result set. When a proper result set cannot be composed, or when a special message regarding the system behavior should be communicated with the client, the multimedia service can use the SystemMessage element. This element provides three different levels for signaling problems, namely Status, Warning, and Exception. The codes and descriptions for the individual elements are defined in annex A of the standard specification. Finally, the validity period of a result set is indicated by the expirationDate attribute.

1) Paging functionality: A client’s desire to retrieve a result set divided into individual pages is expressed by the use of the maxPageEntries attribute at the input query. If this flag is set, the multimedia service is responsible for dividing the complete result set into a series of individual MPQF instance documents. For this purpose, the OQF provides two attributes, namely currPage and totalPages to identify the individual pages.

2) Individual Result Items: The ResultItem element of the OQF holds a single record of a query result. In the MPQF schema, the element is based on an abstract type which is targeted at future extensibility and allows more concrete instantiations. Figure 6 illustrates the standardized version of such an extension
The ResultItem has four attributes and six elements. The four attributes are recordNumber, rank, confidence, and originID. The recordNumber is a positive integer and the only required attribute. The recordNumber ensures the distinct identification of each record amongst the set of records returned for the given query. It can also be used in relevance feedback retrieval to refer to the relevance records. The rank is an optional attribute to indicate the relative similarity of the record to the submitted query. The confidence is an optional attribute to demonstrate the subjective correctness of the query result. The originID is also an optional attribute to indicate from which URI the specific record came from. For example, when there are multiple service providers involved with answering a given query, the originID can be used to identify the service provider from which the result item is received. In the following, the available elements are introduced:

  • Similar to the GlobalComment element, the Comment is a placeholder for a text message to be transmitted to the client. Note, that the contained information should be focused to one specific result item.
  • The TextResult element holds the retrieved result item as text type.
  • The Thumbnail element carries the URL of a thumbnail image of a specific result item.
  • The MediaResource element contains the URL pointing to the location of the media resource of the retrieved result item. For example, a URL to the video or audio file.
  • The Description element is a container for any kind of metadata response based on an XML-Schema. For example, if the multimedia service is composing the result set based on the MPEG-7 standard, the Description element holds an instance document of MPEG-7 and if the service is composing the result set based on the TV-Anytime standard, the Description element can hold an instance document using TV-Anytime metadata.
  • The AggregationResult element allows for schema-valid instantiation of results of an aggregation operation (e.g., SUM). The main difficulty in expressing an aggregation operation is its missing description element within the service’s XML schema. Therefore, an aggregated element is identified through the attribute in the OutputDescription element of the corresponding input query.

All these elements, except the AggregationResult, can have an optional fromREF attribute and can occur a maximum of two times within one result item. This attribute indicates the origin result set in case of a Join operation.

C. Query Management Tools

The management part of MPQF copes with the task of searching for and choosing desired multimedia services for retrieval. This includes service discovery, querying for service capabilities and service capability descriptions. Figure 7 depicts the element hierarchy of the management tools in MPQF. As described previously, the management part of the query format consists of either the Input and Output element depending on the direction of the communication (request or response).
A management request can be used to find suitable services (e.g., by interacting with a registry service) which support the intended queries or to scan individual services for their capabilities. The capability of every service is described by its service capability description which determines the supported query format, metadata, media formats, query types,expressions and usage conditions. The service capabilities are further explained in subsection IV-C1.
A management response contains the results of a service discovery request initiated by the requester. A service, aggregated service provider or registry service returns either a list of available service capability descriptions or a system message in case of an error. If no service is available or there is no match to the requested capabilities, an empty Output element is returned.

1) Service Capability: A service capability description determines what kind of retrieval functionality the respective service supports. The following elements are supported:

  • SupportedQFProfile: Describes the supported query format profile of the service. A QFProfile may define a subset of available query types and expressions that a service is capable to process.
  • SupportedMetadata: Describes the metadata that can be processed by a certain service using a list of addresses (URIs).
  • SupportedExampleMediaTypes and SupportedResult- MediaTypes: They indicate the media formats that are supported by a multimedia service for processing and responses. The formats are specified by MIME types.
  • SupportedQueryTypes and SupportedExpressions: These describe the query types and expressions supported by a search condition. Annex A of the standard specifies the allowed information, types and expressions based on classification scheme terms.
  • UsageConditions: They describe the usage conditions (e.g., payment is needed, authentication is required, etc.) of a certain service. Similar to query types, usage conditions are listed in a classification scheme. The assignment and validity of a usage condition can be specified fine granular (e.g., for a specific query type) or general for the whole service.

The following code (Code 2) shows an example of a service capability description. The supported profile in this example is full. The supported metadata is based on the MPEG-7 schema and the supported media types are mp3 and AAC for example and result set, respectively. Moreover, the description supports the following functions: query types ( stands for QueryByMedia), expressions (100.3.1 means that all Boolean expression types are supported) and usage conditions (200.1 notifies that the service, in detail the query type, requires Authentication). Note, that for simplicity and space concerns the full classification scheme URN (urn:mpeg:mpqf:cs:ServiceCapabilityCS:2008:X.Y.Z) is not presented.


2) Service Discovery: MPQF allows the expression of a variety of requests, such as query by image example or query by free text which might need to be analyzed by a NLP engine. For a multimedia service provider, it is very likely that not all functions specified in MPQF are supported. In such an environment, one of the important tasks for a MPQF client is to find the services which provide the desired query functions and contains the desired media format. It is the role of the MPQF management tools to support client identification of desired services. For this purpose, four conceivable service discovery scenarios are specified and these are now detailed in turn.
a) Tell me everything you know: This scenario assumes that a user knows the location of a MPQF service or a registry service and then wants to know all functions the service provides or all services that are registered. In this case the client issues an empty management input request to the service. If the service provides an aggregate service, all information on services beneath the service is collected and returned.
b) Tell me the services supporting the desired capability: This is the second scenario in which a client tries to find the service providing the client’s desired functions. It is expressed by the DesiredCapability element. This element states the requested capabilities (e.g., metadata model, query types, etc.) the multimedia service must provide.
scenario provides a means to ask for information on all capabilities that a set of services (identified by the ServiceID element) provide.
d) Tell me if the specified services support the specified capability: The last scenario in the service discovery process is a combination of the elements used in paragraph IV-C2b and IV-C2c. In this case, a client is asking whether a specific set of services supports a given capability description.
e) Common return to a service discovery request: For any kinds of management requests, the same formatted instance is returned. The result is embedded in the Output element of the management part and contains several AvailableCapability elements as presented in Code 2 describing all matching services.

3) Service Selection and Aggregation: There are two different mechanisms for service selection; either the client connects directly to a service and performs the multimedia queries or the client uses a service provider, which knows the other services and forwards the queries to them. Connecting directly to one service has the limitation that only queries corresponding to the service capabilities can be requested. A service provider on the other hand has the opportunity to distribute parts of the query to different services which may be specialized in handling this type of query.

Example Scenarios

As mentioned previously, MPQF provides many noteworthy concepts and filter criteria for the expression of multimedia requests. In this section, some important characteristic features of MPQF are highlighted and explained in the context of usage scenarios.

A. Simple scenario. Combination of free text and conditions over the XML metadata

Keywords are the most common way for searching, and capture user information requirements satisfactorily in most situations. However, in multimedia content searches, the coexistence of many different media types (possibly in many different formats) often results in limited keyword-based searches. Typically, these contain explicit conditions regarding the features of the digital objects to be retrieved (e.g. file format, file size, resolution, language). These searches, though simple, are very common, and MPQF allows them to be expressed in a simple way. Consider, for example, a situation in which a professional user wishes to buy large images of Hong Kong in order to illustrate a publication. After capturing the user criteria through a form or some other kind of user-friendly interface, these criteria could be formalized using MPQF and submitted to one or more service providers. The example query in Code 3 shows how QueryByFreeText and conditions over the XML metadata can be combined to express the user’s image requirements. The query requests images (in any format) which are related to the keywords ”Hong Kong” and which have a width greater than 1000 pixels. In this case, the query is expressed in terms of MPEG-7 metadata, but other formats could be possible if the service provider required alternatives.


B. Query-by-example

Query-by-example (see Code 4) offers an alternative approach whereby a search can be expressed using one or more example digital objects (e.g. a number of image files). Though the usage of low-level feature descriptions instead of the example object bit stream is also considered query-by-example, in MPQF these two situations are differentiated: query-by-media applies to queries including digital media objects while query-by-description uses the metadata feature descriptions. Let’s imagine an example scenario related to the medical domain. The Lister Hill National Center for Biomedical Communications, from the National Library of Medicine (NLM) in the United States, maintains a digital archive of 17,000 cervical and lumbar spine images. This collection of images is catalogued in a limited way due to the prohibitive costs of having the images analyzed and annotated with metadata by radiologists. Let's imagine a doctor involved in an epidemiology or clinical study trying to find like reference material from prior stored images. The doctor would probably query on the basis of one or more example images and would retrieve a list of images ranked by similarity. This case would be achieved in MPQF through the usage of QueryByMedia, which uses a media sample (such as an image or video) as the key for search. Code 4 shows how this query type would be used in combination with other conditions over the XML metadata to fulfill the described use case. The query requests JPEG images which are similar to the given sample image and have an attached Dublin Core metadata descriptor date greater than 2002-01-15.

C. Query-by-Media

QueryByMedia and QueryByDescription are the fundamental operations of MPQF and represent the queryby-example paradigm. The individual difference lies in the sample data provided. The QueryByMedia query type uses a media sample such as image or video as a key for search, whereas QueryByDescription allows querying on the basis of an XML-based description key. The example query in Code 5 shows how an example description can be included in a query to form a condition. The query requests JPEG images which possess descriptors exactly matching the attached MPEG-7 descriptors.


D. SpatialQuery

The sample in Code 6 illustrates the usage of a Spatial- Query which allows a certain spatial relation to be used as a condition. Let us assume that a multimedia service provides a large set of images showing different animals. A user might be interested in finding all images that show a cat and a dog and in which the cat is on the dog’s left side as demonstrated in Figure 8. In this case, the query presented in Code 6 formulates this request by using the SpatialQuery condition and a respective spatial relation. Note, that this example

assumes that the multimedia service is capable of extracting the necessary information (object recognition, low- and/or high level features, etc.) in order to detect the animals. However, the query can also be modified using further metadata descriptions as input parameters.

The query in Code 6 specifies that the output should use media resources by asserting as "True" the attribute mediaResourceUse in the OutputDescription element. Thus, a possible result set satisfying the request might be of the form shown in Code 7. Each result item is expressed by a MediaResource element and contains recordNumber and rank information.


E. QueryByRelevanceFeedback

A relevance feedback approach is a very common and widely used multimedia retrieval paradigm which employs client interaction to improve retrieval efficiency. It allows users to tune ongoing retrieval by indicating good and/or bad examples of previous result sets. If the result sample of Code 7 is considered as a starting point, there are three result items. As an example, the first and third result items are considered to be good results while the second is a poor match. The example shown in Code 8 expresses a relevance feedback condition by indicating the two positive examples (recordNumber is used) and one negative example by prefixing a NOT condition. The previous result set is identified by its mpqfID attribute value (mm1221).