I.R.I.S. Working Group

Interoperable Multimedia Retrieval in Distributed Systems


The VAnalyzer is an Java based application that produces valid MPEG-7 metadata descriptions based on video data. It performs an automatic extraction of visual MPEG-7 low level features (e.g., Dominant Color) as well as object (e.g., face) recognition [1] and tracing [2].


The above figure highlights the conceptional design of the VAnalyzer that is divided in three main units. Here, continuous lines show direct exchange of information and dashed lines indicate dependencies.

The UI is build inside the NetBeans Platform, which is a generic framework for Swing applications. The UI is the graphical representation of the VAnalyzer functionalities. A user is able to load a video file, perform well known video playback options (e.g., play or fast forward) and select an arbitrary amount of available processing algorithms. The UI supports during playback of a video also the visual presentation of results of the annotation algorithms (e.g., face detection, object tracing etc.).

The core components serve as a mediator between the UI and the core processing units. It recieves the data from the UI and forwards it to the according unit and vice versa.

As already mentioned, the conceptional design of the VAnalyzer is divided in three main units, which will be described next.

The video unit provides the access to the video data. For this purpose, the application makes use of the Java Media Framework (JMF). Beside JMF, several other Java based media processing frameworks exist, for example JVLC, FMJ or QTJava. In contrast to JMF, these frameworks offer insufficient processing functionalities (e.g., frame access) or needed documentations are missing (e.g., API). JMF at its core is able to process a certain set of media formats. To enlarge this set, plugins for JMF exist, which permit the usage of media formats available in the underlying operating system or the integration of external libraries, like FFmpeg. The VAnalyzer makes use of the Fobs4JMF project to ensure a broad support of media formats (using FFmpeg).

The core objective of the VAnalyzer is the creation of valid MPEG-7 descriptions (XML instances) of video data. This functionality is encapsulated in the MPEG-7 creation unit. Here, JAXB is used for the XML processing. JAXB is fully integrated into the Java Runtime Environment (JRE) since Version 1.6. It generates a Java class hirarchy on the basis of a XML Schema and performs automatically the mapping between these classes and a XML instance. The procedure of writing the informations saved in the classes to a XML instance is called marshall and unmarshall the other way round.

The metadata creation unit is a container used to centralize the actual implementations of the extraction algorithms and the needed external libraries. The input and output of these algorithms is the raw data of a video frame. If a algorithm recognizes or traces an object, it will be marked graphically by a Region of Interest (rectangle) in the output frame. This output will than be visualized at the UI. The VAnalyzer is able to extract the following MPEG-7 low-level features: Color Layout, Color Structure, Dominant Color and Edge Histogram. In addition to these low-level features, the VAnalyzer offers two solutions for object recognition and tracing. The first solution makes use of the OpenCV project. This framework detects objects (e.g., faces) using haar cascades. A second solution follows the approach issued by Marko Heikkilä and Matti Pietikäinen in [2]. In this article, the authors propose a object detection using background substraction using background models.

[1] G. Bradski and A. Kaehler, Learning OpenCV - Computer Vision with the OpenCV Library . O'Reilly, 1st ed., October 2008.
[2] M. Heikkilä and M. Pietikäinen, "A Texture-Based Method for Modeling the Background and Detecting Moving Objects," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 657-662, April 2006.