During the time Phil Livingston, ex-Panasonic, and I worked together at AZCAR, we often shared sometimes silly e-mail messages about technology topics. Sadly, Phil passed away in 2006. But the messages we shared included one I still find
quite amusing:

“Metadata is bits about the essence, which is the stuff we are watching, and thus is stuff about the real stuff. Metadata is also stuff about the stuff which can be stuffed into the stuff either as part of the stuff, or when not present, can be stuffed with bits that contain no stuff. ... Unfortunately, when no stuff is present, it may still be necessary to send stuff to tell the receiver that the bits are not stuff, though they may be a bit stuffed for syntactical purposes.”

Defining metadata

Well, you get the point. Metadata is not a new concept, but in our industry the first real definition of how content and metadata relate dates from the SMPTE/EBU Task Force for Harmonized Standards for the Exchange of Program Material as Bitstreams, which completed its work in 1998. In the documents the task force created, essence and metadata are carefully defined.

Two types of metadata are in widespread use in our industry. Structural metadata is that which is necessary to decode and use the content. Identification of the type of compression, number of audio channels, coding parameters, colorimetry, resolution (temporal and spatial), and other technical data about a file, or stream, would be structural because without them the content might not be usable. It is possible that additional structural metadata might be added to a file during processing, especially if technical parameters are changed, or perhaps additional tracks added.

Descriptive metadata can be an enormous set of information, and in general, it increases throughout the production process. The time of capture from a camera’s time stamps, geo-location information derived from GPS, scene and take numbering, and possibly script information might be saved with the essence. Over time, additional information might be added to make the content more easily searched. A simple example is a “good take flag,” which could be added that might help winnow down the amount of content ingested during post production.

Media asset management systems are often a source of a steady increase in the amount of metadata captured. A good example of such a system would be a sports logging system. Teams need to be able to search and categorize plays and thus add metadata that can grow during logging and use. For instance, a football team may want to add categories for offense and defense, plays in the “red zone,” passes, runs, quarterback scrambles, sacks, penalties, key players (by name or number), and time on the clock.

It is important that it is possible to add metadata along the way. For instance, the first logger might only get a couple of categories (which team has the ball, clock, down, field position). A later analyst might add the play that was called, who ran, the results of the play, etc. A coach might add the play that was called, and so on. In entertainment production, other metadata might be added — for instance, scene continuity notes, color correction instructions, and so on. The ability to add metadata thus can be crucial to making the best use of the essence.

As a result, it is a critical step in designing a complete metadata and essence workflow to be aware that later changes and additions to metadata are expected. The most obvious example is to ensure that the metadata repository can be updated without changing the essence, and without losing the link between metadata and the essence to which it refers.

Clearly, it would not be a good idea to embed the metadata in the essence file if every time a change is made to metadata the entire file has to be stored again, and version numbers updated to reflect changed file content. This could result in explosive growth in storage needs in some cases.

Our football example begs to have the descriptive metadata stored in a separate repository, perhaps a database, with a reference to the essence file to which it relates. This allows small changes to be made with no impact to storage of the essence. This is sometimes referred to as a sidecar file.

Conversely, it makes no sense to put the structural metadata in a sidecar, which might make decoding the essence more complicated. In fact, one critical piece of structural metadata is the unique identifier linking essence and metadata together, which of course must be stored with both.

Locating content

One content identifier often used is the ubiquitous SMPTE Unique Material Identifier (UMID). Although these are not globally unique and registered in a global database of all content ever created (which would be difficult, if not impossible), UMIDs are easy to generate and manage. Because they are locally generated, a camera can create an identifier on a mountaintop. This allows metadata to be referenced to that UMID-identified content later.

I have become convinced in the last few years that metadata may actually be more critical than some of the essence to which it refers. One case is finding content. The structural metadata that identifies where essence is located cannot be corrupted. If it does, it may be at best difficult, and at worst impossible, to find the essence itself. A MAM database that allows content to be searched and retrieved must be carefully protected. If the location data is lost, essentially the essence itself becomes at best opaque.

The SMPTE/EBU Task Force related an additional concept, which is sometimes confusing to people in our industry, that of “wrapper.” The wrapper concept is based in IT techniques and quite literally is a layer of structural metadata that surrounds the bits of essence and metadata. It allows the decoding and transport of content (defined by the Task Force as essence plus metadata) in a way that can be parsed by any application with knowledge of the wrapper syntax.

A standard wrapper does not guarantee the ability to decode the essence — because the receiving decoder might not understand, for instance, MPEG-2, only DV compression — but it will allow the structural metadata and essence to be retrieved from the file in a standardized way. The most obvious examples in common use today are Material Exchange Format (MXF) and QuickTime, though dozens of others exist.

To recap, metadata is truly “bits about bits,” but it is much more, and critically important in our industry today.

John Luff is a television technology consultant.