Metadata

Metadata is a critical component in implementing an effective and efficient broadcast workflow because this core information about your content is what allows you to organize, locate, process and manage that content according to business rules.

Metadata is essentially data about data. In the case of broadcast workflows, it is data about video content.

Types of metadata

Multiple types of metadata come from different sources and can be used in different ways, each contributing to the efficiency of your overall workflow.

Structural metadata refers to the physical properties of the content: format, bit rate, resolution, file size, creation date, etc. This information may be obtained “for free,” as it is contained within the video or wrapper itself without user intervention or input. One analogy is the EXIF metadata in photographs — date taken, picture size, file size, GPS data (if available), camera make and model used — automatically recorded by cameras within images as they are captured. All of these details can be useful in categorizing and organizing recorded media.

You can use structural metadata to segment your library of content and organize it into logical groups. You could, for example, filter the view based on format (SD or HD), time since creation or aspect ratio. The ability to filter content and create virtual groups is essential for organizations managing large volumes of content, and it is also valuable in filtering content views for different functional groups so that they see only the content that pertains to their work.

Descriptive metadata is data that is entered into a system by operators or external applications. Unlike structural metadata, it cannot typically be generated automatically or derived from the properties of the content itself. Descriptive metadata is generated by one or more processes and maintained in some form of enterprise-wide database. It often stores content-specific information such as shot-logs, descriptions, actors, locations, directors and scene-specific information. It can also store business-specific information such as content usage rights, availability windows, planning and distribution information, and workflow process stage.

When descriptive metadata exists in multiple systems throughout your enterprise, it becomes difficult to get a federated view of all that data. Automation systems, traffic and planning systems, archive systems, production systems, and media asset management (MAM) systems frequently contain differing subsets of data that are relevant to their particular responsibility for managing and processing that content. By employing a MAM system, you can harvest data from these sources and view it in a consistent, federated model.

Storing metadata

As descriptive data is largely freeform, creating and storing it “as is” results in large collections of unstructured data that are impossible to use and manage efficiently. To ensure that stored metadata meets the requirements of all dependent users and third-party systems, you can use a data-modeling exercise to define the type of metadata you want to store, the structure of that data and clear taxonomies (language, synonyms) for specific data fields. Specific data fields can apply to entire assets, to clips or subclips (scenes), and even to specific frames of video.

This modeling exercise typically involves surveying all potential users of the system to create a master set of required data fields and valid values. The master set also can include a federated and normalized list of data fields taken from legacy systems.

By standardizing how content is described and how that content data is stored, you can facilitate effective content searches farther down the chain and, in turn, access data and make business decisions about content in a consistent and productive manner. Ideally, this process is performed up front, but it's important that the system allow for changes to the data model as the organizational processes and requirements change over time.

You can store both structural and descriptive metadata hierarchically in order to best represent the model of the content, as well as the desired use of the metadata by your organization. For example, some metadata may be applicable for the entire clip, and some might only be applicable to a particular scene (subclip). Some metadata might be applicable to a specific point in time, such as a particular video frame. Consequently, your data model should differentiate between metadata about clips, subclips, frames, clip sequences and even collections of clips.

You can also employ context-specific metadata structures that are related to other higher-level categorizations. For example, different types of content often are described in different ways in order to maximize the organizational capabilities of the system. Documentaries are described differently from sports or news content. Context-specific metadata allows these different metadata models to exist within the same system.

Metadata is valuable not only in the organization of content, but also in storage and in tracking of processes throughout its life cycle. Information about the content life cycle includes the status of the completed processes through which a piece of content has passed. This usage information, arguably a subset of descriptive metadata, is potentially the most powerful tool you have for improving workflow efficiency.

In its entirety, the content life cycle defines the processes that content must pass through for its intended purpose, such as distribution to broadcast, Web and/or mobile. The life cycle may dictate digitization of content, followed by a quality check (QC), light editing for each distribution platform, editorial approval, format conversion and another QC before content is moved to the distribution platform.

Workflow

Figure 1 demonstrates a workflow taking a piece of content from acquisition through to distribution. While each of these steps involves human intervention, metadata informs and guides these processes. Figure 2 identifies the increasing volume and variety of metadata that drives the workflow forward. Each step in this workflow creates additional metadata that, in turn, allows personnel across the organization to see where the content is in its life cycle. Equally important, this metadata enables content to be filtered and further conditionally processed based on the outcome of the steps through which it already has passed.

While some of these processes can be automated and some require manual intervention, the harvesting of metadata about each of them will enable your workflow control system to automatically route content to the next process in the chain, and potentially trigger the next step in the life cycle. When manual intervention is required, metadata can be used to notify the appropriate users that the content is ready for them.

Consider this example. Ingested content is automatically registered and all of its structural metadata harvested. Depending on its intended purpose and corresponding life cycle (or workflow), it can be automatically routed for an automated file-based QC, with the results of that QC passed back to the metadata management system. If the content failed the QC, it could be moved to a quarantine folder while an operator is notified that the content requires attention. If it passed the QC, it can be moved automatically to the production system and the editor notified that the content is available for manual processing.

Throughout this process, the workflow system can intelligently route and notify based on the growing amount of usage metadata it collects. As this metadata profile grows, you can use it to identify and resolve bottlenecks in your workflow. For example, if you have a manual QC step in your defined workflow, you could identify how long content waits to be checked prior to moving to the next step. If it takes too long, your organization can redeploy personnel from newly automated tasks in order to speed that manual process.

Metadata can be stored in an external database that references content. It also can be stored within the content itself, usually within the wrapper but sometimes within the essence itself, such is often the case with timecode.

Modern content wrapper formats such as QT or MXF reserve space in the clip header to store custom data about the content that is important to your business. Thus, when content is passed to another system, the data is passed along with it. One example would be content and data from a file-based video camera. Points of interest, or even subclips, marked by the operator during shooting are included when content is transferred from the camera. Inclusion of this logging information with content negates the need for content to be shotlisted again.

Metadata embedded within the clip has the advantage of traveling with the content wherever it goes. As a result, this information is available to other systems throughout the workflow. It also is readily available with content shared with or sold to external users or companies. Unfortunately, reliance on embedded metadata also means that any metadata management systems must be able to parse the clip in order to access the metadata, and this approach has inherent performance and scalability limitations.

Alternatively, a media management application with a database can be used to define and store metadata about the content. The external database maintains a reference — often a file path or unique identifier — to the file as it is stored on disc, or data tape. The advantage of this approach is the ability to search quickly through metadata about the content.

Storing metadata in an external database enables scalability, but because metadata doesn't explicitly travel with the content to external systems, cross-system integration can become more complex. Keep these factors in mind as you design your metadata model.

Your workflow can be enhanced enormously through the effective use of metadata, which increases the visibility of the content within your system, provides life cycle status information, and enables automatic routing and content processing based on business rules. That said, your workflow efficiency can only be as good as the metadata you collect and your diligence in ensuring consistency as your system grows.

By Simon Eldridge is senior product manager for Omneon, now part of Harmonic.