Database design for broadcasters

In the world of nonlinear editing, PC-based DVEs and CGs, and server-based play-to-air, the VPs of engineering at broadcast stations are now responsible for managing data and metadata.

Databases should be backed up daily onto a tape-based digital storage archive during periods of non-peak use. The archive may also be used for any other digital content storage. Photo courtesy of Telemundo.

All engineers should recognize the terms “essence,” “metadata” and “package” and adopt them as part of their vernacular. Most engineers are already familiar with the terms “essence” (a video file, information on a videotape or a graphics image) and “metadata” (which refers to a wealth of information associated with the essence). For instance, metadata can include information about closed captioning, shot markers, rights management, talent and production staffs, and details about associated audio tracks.

The newest term, “package,” is the sum of the essence file and the associated metadata. It refers not only to the content but to the detailed information about the content. Perhaps most importantly, in a world in which audio-visual and information-technology networks increasingly share the same infrastructure, the “package” concept ensures that metadata is seamlessly integrated into structured databases. This greatly expands the ability of video professionals to find, locate and manage content throughout their organizations.

Every step of the production process — acquisition, production, distribution and transmission — depends on the accessibility of metadata to easily and accurately track essence.

The current library management system employed at most stations around the country can serve as the basis for a new type of asset management structure — one that expands beyond videotapes with bar codes to encompass entire packages, including essence and metadata. This new, sophisticated architecture, which VPs of engineering should now recognize and acknowledge, will be built around a relational database.

Figure 1. The Microsoft SQL Server Enterprise Manager is used to manipulate an SQL database. The interface shows graphical representations of tables, stored procedures and specific data types.

There are many types of databases, but only two are widely used in broadcast applications. The less common of the two is called a flat-file database, and is mainly intended for small quantities of information. This model is based on using one or several files, usually with comma- or tab-delimited entries. Generally, the flat-file design is not scalable, nor is it extensible enough for complex data types.

The relational database

The second and more common type of database found in broadcast environments — and the focus of this article — is the relational database. Most major databases used for tape library management software and asset management systems are relational. Several companies, such as IBM, Microsoft and Oracle, produce relational databases. In spite of this variety, most of these databases have the same basic functionality; they differ only in how they process entries and searches. The high-level term that describes the overall operating environment is “relational-database management system” or RDBMS. An example of such a system, the Microsoft SQL Database Enterprise Manager, is shown in Figure 1.

During the early '80s, big, proprietary databases on mainframe computers were the accepted norm. At that time, the IT user community was highly skeptical that the relational database would ever amount to anything. But the relational database has become the world standard, for several reasons. By design, it has a fast and efficient architecture that allows for specific searches on targeted groups of data. It has a set of common cross-vendor rules that affords developers a very quick learning curve. The structured-query language, or SQL, is a simple, natural-language, data-manipulation tool that all major relational databases support. Lastly, and perhaps most importantly, early efforts by a group of nascent software vendors to develop open and cheap (or free) database software instead of expensive proprietary systems spawned widespread development support for the relational model. In fact, for years, Oracle has allowed free downloads of some of its relational-database products — a tradition that continues to this day.

Modern broadcast facilities incorporate databases for a variety of applications. For example, any modern automation system relies on a database. Databases can also be found in the tape library management software, the traffic system, the graphics library tracking software, the news production system, the rights management engine, and perhaps a facility scheduling application. Now add metadata stored in the networked nonlinear editing system, or within graphics files. The combination of these databases supports a wide variety of programming types such as sports, news, weather, graphics, audio, commercials, promos and all long-form content.

Content management

A big challenge in modern American broadcasting is content management. The industry is striving to curtail its dependence on the physical movement of tape by introducing file-based storage with advanced searching and possibly desktop browsing. So far, between the tape library management system and automation, about 60 percent of the solution is in place. Some type of digital asset management system will inevitably become a part of the infrastructure. The architecture will center on a hybrid management of physical (shelf-based) and logical (file-based) assets. Initially, this system will rely on existing data from other databases that currently perform only specialized tasks.

To institute any type of digital asset management (DAM) architecture, you must create a “data model.” Once a data model has been put into production, it's commonly called a database “schema.” If a facility has an existing tape library management system, this process of creating a schema has been performed to a limited degree already, probably by the software vendor. But you must consider a wider variety of assets (including tape). To deal with these new content types, you need to provide a database that understands the necessary defining attributes. You also need to institute a framework based on the existence of two top-level asset types. “Content assets” are those with an essence file, such as clips on a video server, and “metadata-only assets,” such as a performance, with pointers to the related video, audio, graphics and contracts. The data model must reflect the relationship between the content asset and the metadata-only asset. A database schema will define the relationships between all the assets. For example, a search on President Bush (metadata-only asset) may include pointers to video clips (content assets) of Air Force One.

There are two main approaches to defining a data model for an environment. The first is to find an existing data model provided by a standards body or DAM vendor and modify it to fit your individual needs. The second is to create a data model from scratch and force a software vendor to support it. Most vendors obviously prefer the former. However, for the purpose of understanding this process, let's explore the latter method.

Creating a data model

The first step is to consider what content you must store, and gather a complete list of the attributes. A series of discussions with department heads may provide the best insight into the specific information that is needed along the production chain. The necessary fields will come from existing databases, spreadsheets, a whiteboard or perhaps papers in a book sitting on someone's desk. The key is to gather all relevant data sources to determine what information is valuable, which of it is repetitive and how much of it is consistent between departments. Chances are that users have a very good idea of what metadata they need. If there are existing databases, you should obtain a copy of the schema. Fortunately, fields that you might miss initially are very simple to add after the fact in a relational database model.

The process of designing a database is commonly referred to as “entity-relationship modeling.” It identifies information and describes its layout in the database. A relational database defines entities by using a group of tables, each of which contains a number of fields (columns). Within the table, some of the fields are called keys. A table may have one or several keys. The key(s) that uniquely define(s) an entry, or row, in a table is (are) called the primary key(s).

Figure 2. The SMPTE 330M UMID consists of an ordered set of components, each providing a key aspect to the identification of material, be it picture, audio or data.

The primary key of a central defining table usually contains a unique material identifier (UMID), sometimes referred to as a global unique identifier (GUID). Most systems assign psuedo-random UMIDs using computer-based clock random-number generators. SMPTE 330M defines a UMID standard, with support for an extended structure that may be used to ensure a unique identifier when moving material between facilities or companies. Figure 2 shows the SMPTE 330M UMID.

Each field in a table is described according to the data type (such as a character, integer or long integer) that dimensions the data storage. During table creation, each field must reserve a place for null (empty) entries as well as entries that require a value. This helps organize and sort the database. Tables relate to each other through a series of relationships, which may be mandatory or optional. Frequently, a data-modeling tool will be used to check the integrity of relationships between tables for correctness during the process.

The simplest type of a database model would have one table with all possible descriptive attributes of any type of asset. But this would not be efficient for broadcasters because the fields necessary for audio, video and graphics may be remarkably different. In this scenario, for each audio asset, none of the graphics attributes would be populated, which would result in many empty fields in the database. Each of these vacant entries still reserves space, and is included during searches, which together lower the overall performance. In designing a data model, you can use a series of data “normalization” rules to remove redundancy and inconsistent dependency of entities, and create efficient relationships. This reduces disk storage and simplifies data management.

The accepted standard rules of normalization are called “normal forms.” A series of normal forms has been defined to create a proper data model for a relational database. These rules instruct the user to perform operations on sets of data to improve the structure. In 1972, Dr. E.F. Codd, author of “Further Normalization of the Data Base Relational Model,” defined the first three normal forms, referred to as 1NF, 2NF and 3NF. Dr. Boyce later proposed an additional normal form, commonly referred to as the Boyce-Codd normal form (BCNF), for further data normalization. A data model in BCNF is generally considered sufficiently normalized for any environment.

Need help?

You don't have to go back to school for a Masters in Library Sciences to understand all of this. A knowledgeable architecture-design consultant can greatly enhance and expedite the process for broadcasters.

There are also efforts underway by industry groups and manufacturers to assist in defining data models for the broadcast industry. The BBC (SMEF data model), the Dublin Core Metadata Initiative, the Motion Pictures Experts Group (MPEG-7), and SMPTE (335M metadata structure/RP210) are all providing structures upon which to build databases in an effort to foster industrywide uniformity.

Figure 3. The metadata dictionary structure defined in the SMPTE 335M standard covers the use of metadata for all types of essence (video, audio and data in their various forms). The structure provides flexibility in capturing metadata and exchanging it among applications through a standardized hierarchy.

Since databases are always unique, these working groups have correctly foreseen the difficulties that media aggregators will face in defining their content libraries. The use of a standards-based data model will be of great help when moving content through the many companies involved in production, post and replication/duplication. Figure 3 shows an overview of the SMPTE 335M metadata structure.

Proper database design will be an important task in the evolution of the broadcast business. To accommodate the multitude of media formats, a good data model is essential. Furthermore, with the addition of a digital storage archive and asset management system to complement the existing tape library, the broadcast industry will gain unprecedented control over content management. This model is fundamentally critical to the future of their business.

Patrick Turner is a systems architect for Concadia Solutions LLC.

Home |Back to the top|Write us