Metadata looms large at IBC

Metadata has never really got the juices going at IBC or anywhere else, but now at least it is being much more talked about and taken seriously by all participants in the content value chain. This will be reflected at IBC2011, where the fast expanding role of metadata in search and recommendation will be witnessed both in the conference and on the show floor. Even now metadata will not immediately leap out at delegates, since many of the relevant products and discussion topics will be under the heading of media asset management (MAM).

A tour around the show floor and attendance at conference sessions would lead to the same conclusion — that there are plenty of metadata products out there and some important standards knocking about, but nothing that bridges the whole content lifecycle from production through contribution, to final search, discovery and consumption. There is still no universal standard, which is not surprising in the sense that metadata has such different requirements at say the production stage than during final distribution. A standard covering all the bases would be too heavy and unwieldy to implement.

Similarly, no single metadata product or technology can really cover the whole spectrum. This will also be evident at IBC, with some products focusing on editing and production where the requirement is for information relating to ingest, asset management, workflow and automated distribution across multiple platforms. During final distribution, or within a home network, information about the video structure and resolution is needed so that the end device or platform can decode and display it correctly. Within an on-demand service platform, the need is for information about the content itself, which can pertain to both audio and video, while penetrating individual scenes, facilitating playback of desired sequences within a whole movie or other program.

Structural information required for display includes length, format of the video, as well as the size and shape of the encoded content. In many cases such attributes are embedded in the video, just as the metadata of a still image is often encoded in the Exchangeable Image File Format.

When it comes to description of the content itself, relevant information includes titles, lists of actors, genre and ratings for the content, as well as different forms of summary information for playlists and program guides. Such information has been provided for movies for some time. For example the Internet Movie Database (IMBD), the world's most popular source of information about TV and celebrity content, has been going around 20 years. This is a searchable database comprising information about more than 1.8 million movies, TV and entertainment programs, including over 4 million cast and crew members. The IMBD provides consumers with information such as show times, trailers, reviews, photo galleries, quotes and box-office data, and relies utterly on metadata to work. Over time this metadata has expanded in scope and some of the developments unveiled at IBC this year will be relevant for IMBD. One of the more significant developments is in applying metadata not just to a whole piece of content, but individual chunks within it, such as a movie scene or song. This can be relevant both for production and search/discovery. Indeed the power of search both for a TV channel and for a database such as the IMBD would be greatly enhanced by being able to retrieve selected pieces of content from within a long item such as a movie. The problem though lies in creating appropriate metadata in a standard way that can be widely used by search or discovery engines.

In fact, a standard providing the basic hooks does exist and has done for some time — it is MPEG-7. This is not to be confused with MPEG-2 and MPEG-4, which deal with compression of AV data. First released in 2001, MPEG-7 was designed specifically to describe the guts of the content in a persistent way that can be exploited throughout its lifecycle in order to facilitate search and navigation. MPEG-7 evolved from ideas developed in the IT world of object-oriented programming, where an objective was to describe different types of data including images and sound. Video did not really figure then, so MPEG-7 picked up some of the object-oriented ideas and took them further by adding a temporal dimension for the moving image. This led to the creation of nine subclasses to categorize content within an AV sequence. Spatial and temporal segments are categorised separately and then combined as necessary. For example an audio segment might be a song or chorus, or just a line by an actor. A spatial visual segment might be just a frame in a video sequence. This frame can be divided into a still region such as background that does not change for a while, and also moving regions in so called spatial-temporal MPEG-7 segments. It is then possible to combine video and audio elements to yield complete AV segments that could represent a searchable object within a whole item of content, with the ability to add information relating to usage and creation for example.

As this brief description indicates, MPEG-7 may provide raw tools but still leaves a lot of work to create a useable implementation, which is why it has barely been exploited so far, although some broadcasters such as the BBC have kept the flame alight by taking it as the basis for ongoing research and development in metadata. There are now signs of renewed interest, if not in MPEG-7 itself, then at least in some of the ideas that it is based on, notably that of making metadata reusable right across the video lifecycle. This is important given that in the absence of effective tools for generating it automatically; metadata creation is an expensive process. Having generated information at the production stage, it makes absolute sense to have that available to operators and consumers for content discovery.

On the other hand, it is inefficient to carry irrelevant metadata down to the consumer. Some of the production information relates to raw footage, known as "rushes" and can be discarded before the contribution and final distribution stages.

But there is other information inserted at production stage that has great potential for search and also for increasing consumer engagement. For example, many production systems use textual metadata specifying duration of video, producer's name, main title, subtitles, copyright owner, keywords, production and broadcast dates. A lot of this information has potential search value.

Apart from the information itself, there are questions to be settled over how best to structure the metadata, and where to store it, in particular whether it should be embedded in the AV data itself, or held as a separate file in an independent database. The embedding approach has the advantage that the metadata travels with the content, but it has to survive encoding and compression. For this reason MPEG-7 is integrated with MPEG-4, the combination sometimes being referred to as MPEG-47.

The embedded approach is also enshrined in the Material Exchange Format (MXF), which specifies a common wrapper for the metadata and content that is independent of the underlying platform. This enables AV material to be moved readily both within and between supply chains, but does not work so well for discovery because of the computational cost involved in running search keys against the whole content. It is far more efficient to search against specific metadata databases linked to the content, and the solution could be to extract the metadata out of the MXF file to create a suitable database that can be queried against inside a MAM system.

There will be plenty of MAMs on view at IBC, exhibiting some of the developments discussed here. One will be the MAM platform from Dalet, incorporating new tools designed to streamline news and sports production workflows, focusing on end to end media management.

MAM is also attracting some start ups, such as the German company Reelway, which will be exhibiting at IBC a new approach to metadata, delivering it via a cloud model as a hybrid software-as-a-service (SaaS) system. This is aimed at small broadcasters, post-production companies and content owners that want to avoid the complexities of deploying their own MAM and metadata. But it might highlight the growing trend for provision of the metadata management itself as an external service provided by specialist companies with the expertise to grapple with the complexities involved.

For this reason, some of the big industry players are stepping up their research and development in metadata, sometimes by acquiring young specialist companies in the field. Cisco, for example, acquired content preparation specialist Inlet Technologies in March 2011, and is also investing in metadata technology vendor Digitalsmiths.

For Cisco, the underlying motive of these moves is to stimulate video traffic across IP networks, since it supplies a good half of the world's IP routers and switches. Cisco thinks video traffic will be increased if people can search inside the content and go further than the simplistic tagging-based retrieval used currently by YouTube, for example. In the case of YouTube, most of the clips are fairly short at present so sophisticated intra-content search is irrelevant, but there will be increasing search activity around longer form content over the Internet.

Metadata, then, is an expanding field, at least partly reflected within the conferences and on the floor at IBC.