Deciding on a file format

As TV broadcasters migrate to file-based workflows, selecting the right file format is becoming important. A file format should support all core processes in television without the need for time-consuming transformations, have a minimum number of transcodings outside of core processes and have a credible future.

Core processes include file import and signal ingest, post production, playout and delivery, and archiving. (See Figure 1.) File import typically requires transcoding to the house format. File-based delivery, such as file exchange or distribution via Web, IPTV, VOD or mobile, usually involves at least a transcoding step for delivery format creation.

A standard archive format optimizes those core processes. The selection of such a format is a holistic approach that should not to be dominated by one department or group but be considered as a cross-enterprise business decision.

Archiving

File formats have two facets — the encoding format and the wrapper format. For use in archiving, an encoding format must:

  • have a standard encoding scheme;
  • be widely supported in the industry;
  • be compatible with the products used for in-house core processes; and
  • be supported by transcoders.

For SD, popular encoding formats are D10 and DV-DIF. For HD, no common denominator has yet emerged. Organizations may have to use different encoding formats for different business processes, but should strive to avoid transcoding wherever possible in order to avoid generation losses and latencies.

The wrapper format must:

  • be open and well-documented;
  • be widely supported in the industry;
  • support partial restore;
  • support play while record;
  • have little overhead compared with the payload;
  • include well-documented mappings for the selected encoding formats; and
  • support embedded technical metadata.

Suitable wrapper formats for TV archiving are MXF OP1a and QuickTime.

File exchange

For file exchange with external partners, the file format has to meet the specifications as agreed upon in the respective service contract. The wrapper format should also allow embedding descriptive metadata, as you may want to embed subsets of the available metadata as contractually required. Hence, external file delivery typically includes transcoding to the required file format and embedding of metadata.

For file exchanges within your organization, try to avoid any encoding format changes, as they are time-consuming and introduce generation losses. Wrapper changes are less critical.

Selecting the right wrapper format

Until recently, the obvious choice for the wrapper format was MXF OP1a. There are MXF-enabled products available to support all core processes, and sufficient interoperability between those products has been achieved. Some products use MXF OP Atom, but the rewrap can be performed easily during file transfer.

However, one product has successfully entered the market that changes this picture — Apple Final Cut Pro (FCP). FCP does not natively support MXF. Instead, it uses QuickTime — a wrapper format developed by Apple and widely accepted in the IT industry. That means that MXF-wrapped material has to either be rewrapped before being delivered to FCP, or a separate QuickTime reference file has to be created. Content created on FCP requires rewrap from QuickTime to MXF before it can be used in an MXF environment.

Hence, facilities that are using FCP as the predominant editing platform may want to consider QuickTime as house wrapper archiving format. A prerequisite is that QuickTime can be used across ingest, production, playout and archiving, thus avoiding rewrapping entirely. If this is not possible, MXF continues to be the best choice.

Embedded metadata

The fact that wrapper formats allow embedding of metadata can be an enabler for certain file-based workflows. However, there are caveats with descriptive metadata. Specifically, there is a lack of recommendations that allow tying down the full metadata semantics. SMPTE's metadata dictionary helps to define the semantics of attributes. However, a fully specified standard reference data model and a specification about how to map that model to the file embedded metadata are required to tie the attributes to specific entities in such a model.

If such a data model was available, an organization could map its own data model to this reference model, thus ensuring that the semantics of the embedded metadata is clearly articulated. If an organization maps its proprietary data model to the embedded metadata, the result is just as proprietary; no other organization has knowledge of the semantics and hence cannot reliably interpret the data when receiving the file. Embedded descriptive metadata only is useful for file exchange between two systems or two organizations if the semantics is unambiguously agreed upon.

Embedded metadata in archives

In general, the usefulness of embedded descriptive metadata in archived files is questionable. At first, there seems to be an advantage in embedded descriptive metadata when archiving because:

  • The file can be identified even without a database referencing it;
  • In case of a loss of the database, basic information can be restored from the file.

In order to qualify the first point, it's important to understand that, in a digital archive, hundreds of thousands of files reside on IT storage systems, primarily data tape storage vaults. There is no way that a user could find a file via exploration of file level metadata, as this metadata is not searchable. Only when maintaining metadata in a database, or as an index in a search engine, can users search and find content. As the full metadata is available in the MAM database, using the MAM search functions is the only sensible way to search for a file.

For the second point, the IT industry agrees that the right way to protect a database is using standard IT database backup. Restoring a failed database typically takes a matter of hours. In the case of an archive of 100,000 hours of content in DV50 with eight audio tracks, it would take close to 260 days of restore time using a single LTO-4 drive. Even if the process used 10 drives in parallel, it would still take almost four weeks to retrieve and analyze the files. It is more sensible to invest in standard IT protection mechanisms and apply related best practices.

There is also the issue of updating embedded metadata. Unfortunately, metadata changes are quite common. Even basic metadata, such as titles, may change throughout the content lifecycle. Hence, there is a high probability that, if not updated, embedded metadata is outdated rapidly.

However, the primary long-term storage technology used in TV archives today is digital data tape, and it is difficult to apply changes to files hosted on data tape. You have to restore the file from tape to disk, apply the change, write the updated file to tape, mark the former version as invalid, and remove it via defragmentation. Defragmentation means copying all valid files to a new tape and releasing the old tape for reuse. In real-world archives, this is not feasible.

A file may even reside in multiple, and potentially remote, locations. Here, metadata updates would require distributed transactions on all copies. Otherwise the database and the primary archived file go out of sync, and the files in the various locations would have different metadata.

Metadata in file exchange

Being able to embed metadata in files that will be exchanged with external business partners is useful, as it allows you to tightly couple metadata and essence. Within a business, metadata exchange can also be accomplished in alternative ways. Examples are partial database synchronization and exchange of metadata via API or XML files.

Due to the lack of recommendations and standards, partners that want to exchange content with embedded metadata have to agree upon the extent and the semantics of the metadata. The extent may differ depending on the type of transaction and the partner relationship.

Whether the embedded metadata you receive remains in the file after importing it to the MAM database or is deleted is of little importance, even though it is a good idea to retain it. It is important to remember, though, that it will have to be updated when the file is retrieved from the archive and delivered to another partner, as it may have been modified in the database. Or the exchange may require a different set of metadata and semantics.

Peter Thomas is CTO for Blue Order Solutions.