As a kid living in
San Francisco I
fondly remember
combing through
the Main Library card
catalog in search of
books on electronics
and telephone systems. Of
course, “analog cards” are
not computer friendly so
starting in 1995 library associations initiated
the mother of all metadata standards
to replace the basic card.
This standard is now called the Dublin
Core Metadata Set (ISO 15836) and
defines only 15 elements. These are:
title, creator, subject, description, publisher,
contributor, date, type, format, identifier,
source, language, relation, coverage, rights.
Metadata has two broad definitions. One
is “structured data about data” or as some
say—“the bits about the bits.” Another definition
is more literal and comes from the
Greek meaning of meta—alongside, that
is “alongside” that being described.
From these basic 15 elements bibliographical
information can describe any
written publication.
The Dublin Core is not fine-tuned to describe
media, so during the last 10 years,
SMPTE, the European Broadcast Union
(EBU), PBS, BBC and others have broadened
(or reduced) these basic elements in
several important ways for the production
and distribution of media. Beyond Dublin,
media-friendly metadata supports frames,
time code, A+V sequences, mark_in/mark_
out positions and more.
METADATA DICTIONARY
SMPTE publishes RP210, a metadata
dictionary. This document (www.smptera.org/mdd) defines 10 different classes
of metadata relevant to describing media.
Of the ten, these four are common: descriptive,
structural, spatial-temporal (of
capture source) and rights.
|
|
SMPTE's RP210, a metadata dictionary, defines 10 different classes of metadata relevant to describing media.
|
|
Ideally, this dictionary can be referenced
by others to build facility metadata
schema. In reality, RP210 is one of several
dictionaries in use today. Due to legacy,
there are literally thousands of in-house
schemas that don’t have a pointer to a formal
standard.
Of the elements, title and length are
fundamental. Other basic elements are
descriptive and structural (or technical).
Descriptive is a form of narrative about
the essence; who, when, where, what with
frame accurate annotation in principal.
Structural is more about the media format;
MPEG-4, 5 Mbps, 4 PCM audio channels
and so on.
Both the EBU and PBS have developed
their own “media core” metadata sets
based on the Dublin Core. The EBU version
is called EBUCore and for PBS, PBCore.
Both of these have been widely accepted.
The Framework for Interoperable
Media Services (FIMS) refers to EBUCore
to describe metadata (wiki.amwa.tv/ebu).
One special kind of metadata is the
object ID. These are short, registered,
searchable, codes used to identify media
files. Examples of these are Ad-ID (spot
ID, tracking), EIDR (movies, TV pgms) and
ISAN (general AV). At registration time, essential
metadata is linked to the ID code
in a Web-accessible database.
Another special type is compositional
metadata. It is used to describe the details
of an A/V sequence including all edit related
data. The old-fashioned EDL and newer
AAF and Apple XML formats are typical.
LEVERAGING METADATA
With files ruling the facility, metadata is
paramount for searching and locating content,
implementing workflows, limiting access
rights, facilitating reuse, and describing
the contents of files.
Without metadata, content loses much
of its value if it can be located at all in a
sea of files. In what movie (file) did Clint
Eastwood say “Go ahead, make my day”?
Metadata is essential to answer this question,
or ask Ken Jennings of “Jeopardy”
fame.
|
|
Fig. 1: Metadata integration operations
|
|
In Fig. 1, the lower half of the diagram
shows the flow of media and metadata in
a typical production workflow.
A most common production operation
is a keyword search and subsequent
AV file play. The functions of a media asset
manager (MAM) are designed for this.
Metadata is a MAM’s heart and soul. MAMs
span a wide range of functionality. A mini-
MAM supports search, browse and creating
clip lists. A maxi-MAM covers many
aspects of the lifecycle of a piece of media.
Metadata is linked to its media in three
basic ways:
Media and metadata are embedded in
the same file wrapper. For example, the
file movie123.mxf contains both media
and select metadata.
Media file1 and metadata file2 are located
in the same directory or linked using
identical file names or similar. For example,
movie123.mpg and movie123.xml
share same name so are linked. Some call
this metadata a sidecar since it rides alongside
the media as a separate file.
Metadata, residing in a database of other
repository, is linked to its media by reference.
The database can be local or in a
distant cloud. The linking process can rely
on the SMPTE Unique Material Identifier
(SMPTE ST330) or similar.
In 2012, most media facilities have defined
an in-house schema for their metadata.
When exporting or importing metadata
from partners, often there is a format
mismatch. This is a pain point. There is
very little metadata interop except within
a facility.
One bright point is the work from the
Advanced Media Workflow Association
(AMWA). They have developed several
MXF-based formats (Application Specification)
that include defined metadata for file
interchange.
A good example of this is the new AS-12
used for commercial delivery. It contains
a “digital slate” with consistent metadata
fields.
METADATA INTEGRATION USING ETL
With the mining of program related
Twitter, Facebook and other sources of
Web information, metadata is becoming
richer in form and content. Importing and
exporting rich metadata often requires
specialized tools. True, most MAM products
offer some form of format conversion
and mapping. However, there are times
when the metadata needs to be processed
and managed independent of the MAM
system. This is especially true when a facility
supports multiple schemas.
The problem of manipulating data records
is an old one. The basic need is to
import a record(s), transform it, and export
it. This functionality could be called
ITE, but it’s not. It’s called ETL—extract,
transform and load. ETL operations are
widely used with databases (see the upper
portion of Fig. 1).
Let’s look at a test case. You are given
1,000 media files to import into your program
library. There is a short XML metadata
file associated with each media file. The
metadata format is completely different
from your house standard. Use a commercial
ETL product to transform the metadata
ready to import into your MAM.
There is a wide selection of ETL programs
to choose from (see Fig. 1). Most
can leverage the cloud to scale for dataheavy
projects. If you are not familiar with
these, download one for a trial run. Most
have intuitive, graphical UIs with built-in
templates to do just about any data transform
you can imagine. I am impressed with
these tools and see a long life ahead for the
category. In a future column I will discuss
the world of “Big Data in the cloud” and
how this will impact workflows of media
and metadata.
Al Kovalick is the founder of Media
Systems Consulting in Silicon Valley. He
is the author of “Video Systems in an IT
Environment (2nd ed).” He is a frequent
speaker at industry events and a SMPTE
Fellow. For a complete bio and contact
info, visit www.theAVITbook.com.