The new world of codecs and formats

It's widely understood that file-based operations inherently are more efficient. A closer look at the typical workflow, however, reveals that while file-based processing, storage and movement of media offers opportunities for streamlining operations, it takes back some of that efficiency by forcing system designers and manufacturers to deal with the huge, growing range of individual codecs used by the different systems and areas of production, preparation and distribution that make up any broadcast workflow.

For decades, baseband analog video and audio were the well-specified, documented and accepted standards among broadcasters for material exchange. As a result, getting media from point A to point B was a straightforward and understood proposition. Little, if any, thought was given to what happened to video within a processing box; it was completely isolated and its internal format irrelevant. As long as a system could accept and output video and audio in the accepted format, it could serve as an island of processing within the larger workflow. The initial move to digital video and audio did nothing to change this. As long as the processor provided adequate picture quality and the required input and output formats, the internal codec was of little interest to the system designer. The transition to file-based workflows is changing all of that.

File-based workflows: New considerations

Notable efficiencies can be gained by passing material from one processor to another in compressed form. System latency can be significantly reduced as, given a specific transfer bandwidth, compressed media can transfer at many times faster than real time. End-to-end picture quality can also be improved by avoiding multiple decompress/transfer/compress stages as media passes through the workflow — at least in theory!

Unfortunately, this last point leads to new concerns for the system designer and equipment manufacturer. Because no single codec is optimal for all of the processes in the system, the system designer must now understand the pros and cons of each of the codecs involved in the various processors that make up the signal path, not only to decide intelligently which codecs should be used at each step, but also to understand the number of transcodes/rewraps that may be necessary to convert files throughout the various steps in the workflow — even if these transcodes/rewraps occur internal to an individual manufacturer's equipment.

Compression: Not “one size fits all”

No single codec is perfectly suited to all workflow tasks. Consider, for example, two basic stages in a broadcast workflow: editing and playout.

Editing is a creative process in which individual elements are stitched together to tell or modify a story. The ability to cut between takes on any frame boundary is a prerequisite of this stage, and intraframe (I-frame) compression is the method of choice, regardless of whether the compression engine is DV, DVCPRO HD or I-frame MPEG-based.

Playout is much more a mechanical exercise. Finished material is transferred to the playout server and played out at the right time under automation control. Playout servers are generally multichannel devices that use high-performance disk drive systems. Because of the high cost of the storage systems, it is desirable to store as much material as possible on those drives while giving multiple channels simultaneous access to that material. As a consequence, interframe (long-GOP) compression is the method of choice, again regardless of the compression engine.

There is an obvious conflict. Two different (though related) compression techniques are being used, and each is optimal for its workflow stage, but not for the other. So, the designer has two choices: decide on a single format and accept suboptimal performance in one operation, or use both codecs with a transcode stage sitting between them.

The first approach usually resolves to the use of I-frame compression on the playout server. Most manufacturers support this model, but the penalty is slower transfer between stages (as the files are physically larger) and a reduction in storage efficiency on the playout server. The second approach allows each stage to be maximally efficient and, depending on the location of the transcode (pre- or post-transfer), can improve transfer efficiency. This model, however, also adds a latency penalty to the overall workflow based on the time taken to transcode any particular clip.

Clearly the broadcaster, system designer and equipment manufacturers all need to be aware of the pros and cons to each codec and transcoder in order to specify and deliver the most efficient design. Broadcasters need to understand where any compromises may be, system designers need a comprehensive understanding of codecs and latencies, and equipment manufacturers need to design their equipment to give broadcasters and designers the greatest range of choices when creating a system. This is a never-ending activity, as there is no shortage of new codecs being announced every year.

Wrapper and file structure

Like compression technique, both the wrapper and basic file structure have significant implications for efficient media transfer, and these must be carefully considered as part of the system design and file-based workflow. The first question should be whether to use self-contained files or reference files — both supported in QuickTime and MXF — at any particular stage.

A self-contained file includes both the essence and the metadata, so there is a single entity to be transferred from one location to another. (See Figure 1.) Self-contained files are perfect for content distribution and remote playout scenarios because of the simple transfer mechanism required. However, because they need to be disassembled prior to manipulation, they are not so appropriate if any kind of media processing is required at either end of the transfer. Reference files are more appropriate for that part of the workflow.

A reference file is made up of the media files themselves, plus a wrapper file that contains metadata and pointers to the media. When an application tries to play the file, it accesses the reference file to discover which essence files to play as part of that clip. (See Figure 2.) This is the file type of choice for editing, language addition, closed-captioning addition, and other applications that modify or add tracks to a clip. No disassembly or reassembly of the file is required. FTP file transfers are more complex, though, as the transferring application must understand exactly which files are pointed to by the reference file in order to ensure that all media necessary for clip playout is transferred as part of the FTP session.

It is, of course, possible to rewrap reference files, converting them from self-contained to reference, QuickTime to MXF and vice versa. This is a less-complicated activity than transcoding the essence and can therefore happen much faster, but it still has implications for the overall latency of the system.

Latency, codecs and file formats

Several broadcast workflows are relatively impervious to system latency considerations, as they include plenty of time to QC, prep and regionalize the material before playout, and then transcode/rewrap content. Other workflows, such as news and sports highlights, demand extremely short end-to-end (shoot-to-show) latencies. Paradoxically, while these are the most likely to benefit from the efficiencies of file-based workflows, they are the least tolerant to the file conversions involved that, by their very nature, slow down the flow of media.

The goal, then, is to minimize the transcode/rewrap stages in these workflows and to make those transcodes as efficient as possible. One simple way of achieving this is to choose the edit codec so that field material can be shot using that codec and then be edited natively on the NLE of choice. Because many playout servers can natively accommodate multiple codecs, a transcode stage can be avoided by careful equipment and codec choice. The downside is that this may limit the total number of channels supported by the playout server — something that should be recognized when the system is designed.

The issue of file structure still remains, however. Low-latency workflows often require the ability to operate on a clip during transfer. Moving a clip from ingest server to editor while it's still being ingested (and making all of the currently ingested material available to the editor on an ongoing basis) is one common example, often referred to as transfer while record. Another example is the need to play out a clip while it is still being transferred to the playout server, or play while transfer. File structure plays an extremely important enabling role in these scenarios.

Many on-disk file formats are written in a non-left-to-right manner in which the header is written to disk and, as material is added to the file, updated constantly to reflect the current state of the file. Such a piece of metadata might be the clip duration. Initially set as part of the first write of the file, the duration is continually updated as material is added. If the clip is considered as a timeline, it is clear that the writing of the file is not left-to-right, as the system is continuously updating the left end of the file. (See Figure 3.) This can be problematic for FTP file transfer. The receiving device will only know the duration of the clip, for example, as identified at the start of the transfer. To resolve this problem, special techniques can be invoked whereby the transmitting system knows that it must keep transmitting new versions of the header part of the file at some regular interval. The receiving system must then integrate that new header information into its own copy of the clip.

The solution to this dilemma is to use files that are written strictly left-to-right. Once a piece of data is written, it is never rewritten. As much header data as is known to be valid and nonchangeable at the time the clip recording started is stored in the header. A flag is set to indicate that this file has incomplete metadata in the header and that the full, valid metadata will be written at the end of the clip. (See Figure 4.) In this way, the clip can be transmitted as it is being written without any concern that a piece of metadata may be out of date. The receiving system has sufficient information (clip name, etc.) at the start of the clip to initialize the clip in its own database, and material will continue to stream in via the transfer mechanism. Final metadata is provided at the end of the clip when recording has completed.

The success of these transfer-while-record and play-while-transfer operations relies entirely on the performance of the network between the transmitting and receiving devices. Any bottlenecks on the network will at best result in reduced transfer speeds and at worst result in starved codecs, where the playout device has run out of new frames to play because the network can't deliver them quickly enough.

Conclusion

File-based workflows offer enormous efficiency improvements but require a new set of skills from all stakeholders involved in their design. Broadcasters must be aware of the issues involved in the choices they make as part of their system specifications. Manufacturers must recognize the ever-growing palette of codecs from which to choose and, wherever possible, accommodate additional codec formats as appropriate. System designers must learn new skills and understand the pros and cons of each mainstream codec. As new codecs are brought to market, they must apply intelligence and discretion in helping clients choose the formats for their system. Furthermore, the system designer must understand the implications of transcoding stages in the proposed workflow, and of file format choice on the overall latency of a workflow. File-based workflows represent a major advance in the state of the art for broadcast, but there's still no free lunch.

Paul Turner is vice president, broadcast market development, for Omneon.