Metadata Is Paramount to Content Files

Al Kovalick

As a kid living in San Francisco I fondly remember combing through the Main Library card catalog in search of books on electronics and telephone systems. Of course, “analog cards” are not computer friendly so starting in 1995 library associations initiated the mother of all metadata standards to replace the basic card.

This standard is now called the Dublin Core Metadata Set (ISO 15836) and defines only 15 elements. These are: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights.

Metadata has two broad definitions. One is “structured data about data” or as some say—“the bits about the bits.” Another definition is more literal and comes from the Greek meaning of meta—alongside, that is “alongside” that being described.

From these basic 15 elements bibliographical information can describe any written publication.

The Dublin Core is not fine-tuned to describe media, so during the last 10 years, SMPTE, the European Broadcast Union (EBU), PBS, BBC and others have broadened (or reduced) these basic elements in several important ways for the production and distribution of media. Beyond Dublin, media-friendly metadata supports frames, time code, A+V sequences, mark_in/mark_ out positions and more.

METADATA DICTIONARY
SMPTE publishes RP210, a metadata dictionary. This document (www.smptera.org/mdd) defines 10 different classes of metadata relevant to describing media. Of the ten, these four are common: descriptive, structural, spatial-temporal (of capture source) and rights.

SMPTE's RP210, a metadata dictionary, defines 10 different classes of metadata relevant to describing media. Ideally, this dictionary can be referenced by others to build facility metadata schema. In reality, RP210 is one of several dictionaries in use today. Due to legacy, there are literally thousands of in-house schemas that don’t have a pointer to a formal standard.

Of the elements, title and length are fundamental. Other basic elements are descriptive and structural (or technical). Descriptive is a form of narrative about the essence; who, when, where, what with frame accurate annotation in principal. Structural is more about the media format; MPEG-4, 5 Mbps, 4 PCM audio channels and so on.

Both the EBU and PBS have developed their own “media core” metadata sets based on the Dublin Core. The EBU version is called EBUCore and for PBS, PBCore. Both of these have been widely accepted. The Framework for Interoperable Media Services (FIMS) refers to EBUCore to describe metadata (wiki.amwa.tv/ebu).

One special kind of metadata is the object ID. These are short, registered, searchable, codes used to identify media files. Examples of these are Ad-ID (spot ID, tracking), EIDR (movies, TV pgms) and ISAN (general AV). At registration time, essential metadata is linked to the ID code in a Web-accessible database.

Another special type is compositional metadata. It is used to describe the details of an A/V sequence including all edit related data. The old-fashioned EDL and newer AAF and Apple XML formats are typical.

LEVERAGING METADATA
With files ruling the facility, metadata is paramount for searching and locating content, implementing workflows, limiting access rights, facilitating reuse, and describing the contents of files.

Without metadata, content loses much of its value if it can be located at all in a sea of files. In what movie (file) did Clint Eastwood say “Go ahead, make my day”? Metadata is essential to answer this question, or ask Ken Jennings of “Jeopardy” fame.

Fig. 1: Metadata integration operations In Fig. 1, the lower half of the diagram shows the flow of media and metadata in a typical production workflow.

A most common production operation is a keyword search and subsequent AV file play. The functions of a media asset manager (MAM) are designed for this. Metadata is a MAM’s heart and soul. MAMs span a wide range of functionality. A mini- MAM supports search, browse and creating clip lists. A maxi-MAM covers many aspects of the lifecycle of a piece of media.

Metadata is linked to its media in three basic ways:

Media and metadata are embedded in the same file wrapper. For example, the file movie123.mxf contains both media and select metadata.

Media file1 and metadata file2 are located in the same directory or linked using identical file names or similar. For example, movie123.mpg and movie123.xml share same name so are linked. Some call this metadata a sidecar since it rides alongside the media as a separate file.

Metadata, residing in a database of other repository, is linked to its media by reference. The database can be local or in a distant cloud. The linking process can rely on the SMPTE Unique Material Identifier (SMPTE ST330) or similar.

In 2012, most media facilities have defined an in-house schema for their metadata. When exporting or importing metadata from partners, often there is a format mismatch. This is a pain point. There is very little metadata interop except within a facility.

One bright point is the work from the Advanced Media Workflow Association (AMWA). They have developed several MXF-based formats (Application Specification) that include defined metadata for file interchange.

A good example of this is the new AS-12 used for commercial delivery. It contains a “digital slate” with consistent metadata fields.

METADATA INTEGRATION USING ETL
With the mining of program related Twitter, Facebook and other sources of Web information, metadata is becoming richer in form and content. Importing and exporting rich metadata often requires specialized tools. True, most MAM products offer some form of format conversion and mapping. However, there are times when the metadata needs to be processed and managed independent of the MAM system. This is especially true when a facility supports multiple schemas.

The problem of manipulating data records is an old one. The basic need is to import a record(s), transform it, and export it. This functionality could be called ITE, but it’s not. It’s called ETL—extract, transform and load. ETL operations are widely used with databases (see the upper portion of Fig. 1).

Let’s look at a test case. You are given 1,000 media files to import into your program library. There is a short XML metadata file associated with each media file. The metadata format is completely different from your house standard. Use a commercial ETL product to transform the metadata ready to import into your MAM.

There is a wide selection of ETL programs to choose from (see Fig. 1). Most can leverage the cloud to scale for dataheavy projects. If you are not familiar with these, download one for a trial run. Most have intuitive, graphical UIs with built-in templates to do just about any data transform you can imagine. I am impressed with these tools and see a long life ahead for the category. In a future column I will discuss the world of “Big Data in the cloud” and how this will impact workflows of media and metadata.

Al Kovalick is the founder of Media Systems Consulting in Silicon Valley. He is the author of “Video Systems in an IT Environment (2nd ed).” He is a frequent speaker at industry events and a SMPTE Fellow. For a complete bio and contact info, visitwww.theAVITbook.com.