The Evolution of MPEG-7

The turn of the millennium brought many flavors to the managing, producing, searching and offering of digital multimedia. Of the available flavors, MPEG has risen to strategic value for audiovisual representation; and MPEG development continues toward a broader, more encompassing field of multimedia management and distribution.

First standardized in 1992, MPEG-1 and later MPEG-2 in 1995 have enabled the production of CD-interactive, DVD, digital audio broadcasting (DAB), digital television, and many of the video-on-demand trials and commercial services. Work did not end there, however.

MPEG-4 became the next extension in the MPEG family of digitally compressed standards. MPEG 4, which is coded in the form of objects, is focused as the first real multimedia standard. Version 1, completed in 1998, and Version 2, in 1999, allow interactivity, a mixed combination of synthetic and natural material, and the integration of production, distribution and content access across a wide variety of interactive and mobile multimedia, including graphics and enhanced digital television.

MPEG has not been still after readying MPEG-4 for primetime. Recently, the International Standards Organization (ISO) finalized the first version of the International MPEG-7 Standard for Content Description, to be published within the next few months.


MPEG-7 became next in the series of ISO/IEC standards committee's development work for multimedia. MPEG-7 does not replace MPEG-4, but rather complements it. Formally called the "Multimedia Content Description Interface," MPEG-7 is not about the compression of images and it doesn't follow the same types of technological developments we are familiar with in the previously standardized MPEG-1, -2 and -4 toolsets. Instead, MPEG-7 is about a rich set of tools that will be used to describe multimedia content.

Finally, another ISO standard, MPEG-21, is now on the horizon; and with each subsequent suite of MPEG tools, a truly interoperable framework for multimedia is takes form.

Today it is hard to find an application that does not make use of multimedia. Broadened applications, such as multimedia digital libraries, broadcast media selection, multimedia editing, home entertainment devices, etc., are all candidates for the toolsets being standardized in MPEG-7. The entire range of applications for these MPEG tools will provide avenues for both users and automatic systems to process audiovisual information.


Publicly accessible multimedia catalogs, large content archives and purchasable content need means for identification that permit uniform and standardized methods of description. MPEG-7 aims to extend the text search capabilities of the Web to that of multimedia content. Functionally, agents will utilize content retrieval information to select and filter material that can be broadcast or "pushed" for such applications as personalized advertising. MPEG-7 descriptions will further enable semiautomatic editing and the presentation of multimedia material, both quickly and cost-effectively through the usage of underlying metadata, just one part of the dynamics found in the MPEG-7 toolsets.

MPEG-7 development isn't new. The first work item began in 1996 and, now in its sixth year, has followed a series of stages that involved dozens of individuals and companies around the world. That sequence of international development followed a prescribed process, outlined in Table 1.

The logical progression of digital multimedia needs and benefits points squarely in the direction of MPEG-7. Quality access to content is the aim of this advanced suite of utensils, which will be used to create descriptive audiovisual metadata tools. Goals of MPEG-7 include the establishment of a comprehensive set of statements that describe good storage solutions, high-performance content identifiers, the assignment of proprietary descriptive information, as well as a set of accurate and personalized filters, search engines and retrieval descriptors.

Media is represented by a vast set of identities which include still pictures, graphics, 3D models, audio, speech and video. Media is already being accessed by automated devices in ways that were previously considered as "human-only." For example, automated imaging processing, as used in surveillance applications, consists of smart cameras and intelligent vision devices that convert the media for use by computerized monitors. Speech-to-text, picture-to-speech and speech-to-picture conversions are tasks that were once isolated to humans, but are now more generally implemented by automated systems using high-speed processors, servers for storage and retrieval, and context/image sensitive control devices for alarm and filing operations.

To make the most efficient use of multimedia content, MPEG-7 will provide the means of guiding the filtering in a stream of audiovisual content description so that the user only receives those multimedia data items that are he or she perfers.

The MPEG-7 standard is segmented into eight major functionalities (see Table 2). Since MPEG-7 is designed as a means for descriptions of multimedia, details about coding, motion processing and decoding will not be found in the standard. The methods of the preceding MPEG standards still hold, with MPEG-7 augmenting them where necessary to facilitate descriptive functionality. Most of what is contained in MPEG-7 will therefore be languages, descriptors and schemes for "describing" the terminal architectures and normative interfaces needed for multimedia content manipulation, categorizing, searching and retrieval. A brief description of the eight sections that make up MPEG-7 is listed in Table 2.

MPEG-7 systems include a set binary format encoding tools that prepare MPEG-7 descriptions for transport and storage, plus the terminal architecture and normative interfaces needed for that transport and storage.

In MPEG-7, a 'language' is established that allows the creation of new Description Schemes (and possibly descriptors), referred to as the Descriptive Definition Language (DDL). The DDL is based on XML Schema Language that had not previously been designed specifically for audiovisual content description. Therefore, MPEG-7 needed additional extensions that further broke down the scheme into a set of logical normative components, which include structural and data-type language subcomponents _ those that are the MPEG-7 specific extensions.

For visual needs, MPEG-7 Visual Description Tools set out basic structures and descriptors that address basic visual features. For each category, both elementary and sophisticated descriptors, which describe such features as color, texture, shape, motion, localization and face recognition, are designated.

MPEG-7 Audio provides structures for describing audio content. These structures, in conjunction with the Multimedia Description Schemes part of the standard, utilize a set of low-level descriptors to deal with audio features that cross many spectral, parametric and temporal features of an audio signal. In addition, High-level Description Tools, those more specific to a set of applications, encompass general sound recognition and indexing Description Tools.

As an example of higher-level tools, MPEG-7 Audio can include descriptions of instrumental timbre, spoken content and other schemes - such as audio signature and melodic Description Tools - necessary to facilitate a 'query-by-humming' search.

To deal with generic features and multimedia entity descriptions, the MPEG-7 Multimedia Description Schemes (MDS) comprise a set of descriptors and description schemes. To be audio and visual "generic," the features must apply to all media such as vector, time, textural description tools and controlled vocabularies. For applications that are non-generic, a more complex set of Descriptions tools is further categorized into five groups: content description, content management, content organization, navigation and access, and user interaction.

MPEG-7 Reference Software is the implementation means for relevant parts of the standard with normative status. The schemes and descriptors of MPEG-7 use an 'eXperimentation Model' (XM) for data structures and procedural code that form the applications. XM software is the simulation platform for the applications, whose XM applications are divided into server (extraction) applications and client (search, filtering and/or transcoding) applications.

MPEG-7 Conformance is the portion of the standard used for developing the guidelines and procedures in testing conformance of MPEG-7 implementations.

The MPEG-7 Extraction and use of description is a technical report with informative material regarding the extraction, and the uses for some of the description tools. "Use of description" refers to narrative and additional material with insight into MPEG-7 Reference Software and includes alternative approaches to implementation.

The depth and details of each of the eight sections is beyond the scope of this writing, at this time. Yet, those wondering where the direction of all this development is headed should realize MPEG is now dealing with a very real set of both technical and human concerns, and it is aimed at the requirements for a powerful set of tools in the multimedia industry. With myriad devices coming into the marketplace, MPEG now addresses coupling the audiovisual standards of MPEG-2 and/or MPEG-4, and adds to it the metadata standard of MPEG-7.

Looking forward, current development work on MPEG-21 will include a Rights Data Dictionary and Rights Expression Language (RDD/REL) for content protection and ownership. Important technology for authors and owners of content, for the service providers and for the consumers of MPEG streams is well under development.

Most recently, during the March 11-15, 2002 meetings in Jeju Island, South Korea, MPEG issued a Call for Proposals on MPEG-7 Systems extensions, notably to address additional coding efficiency for MPEG-7 descriptions. For the MPEG-21 standards suite, a fourth Call for Proposal (CfP) was issued for the next round in the standards suite known as MPEG-21, as well as MPEG-21 Digital Item Declarations. Expect to see much more from the MPEG standards and toolsets in the coming years.

We should further recognize that MPEG's efforts are about interoperability for the consumer. To be successful, consumers need to feel confident that they will be able to use content without the burden of incompatible formats. The harmonization of a common ground for codecs, metadata, etc., is what industry has preached for decades, and with ISO MPEG's work, hope continues.

Karl Paulsen

Karl Paulsen is the CTO for Diversified, the global leader in media-related technologies, innovations and systems integration. Karl provides subject matter expertise and innovative visionary futures related to advanced networking and IP-technologies, workflow design and assessment, media asset management, and storage technologies. Karl is a SMPTE Life Fellow, a SBE Life Member & Certified Professional Broadcast Engineer, and the author of hundreds of articles focused on industry advances in cloud, storage, workflow, and media technologies. For over 25-years he has continually featured topics in TV Tech magazine—penning the magazine’s Storage and Media Technologies and its Cloudspotter’s Journal columns.