Video format conversion

An essential part of a file-based workflow is the need to convert media files from one format to another. For real-time video, there may need to be a conversion between 24, 25 and 30 frame rates, but with file-based video, the options are endless, with different wrappers and compression codecs making file exchange a potentially messy business.

Have you ever been asked to convert an AVI file to QuickTime and had to explain that the formats need qualifying? RGB or YUV? MPEG-1 or MPEG-2? The list goes on.

Why should there be so many codecs? There are international standards (the MPEG family and some SMPTE standards), but there are also many proprietary formats.

Device manufacturers carefully select codecs to maximize the performance of their equipment for a given application. One example is the camcorder. The camcorder has limited processing power, constrained by the need to run off batteries. High compression ratios may be required to get similar record times that cameramen had with videotape, and also to lower data rates to the capabilities of the recording medium.

One reason tape is not dead is that HDCAM-SR can record at 440Mb/s or 880Mb/s. It is challenging to record to solid-state drives at that rate, although it will come, and optical drives only support 50Mb/s. A P2 card uses four memory chips with striping to get guaranteed 100Mb/s recording.

The result of these constraints is that cameras use a compression scheme tailored to a power budget and to suit the chosen media format. In post, the ideal format would be to use 4:4:4 color sampling, but lower data rates, 4:2:2 or even 4:2:0/4:1:1, are used by camcorders. This compromise works if due regard is given to the processing that will be used in post. Matting and compositing needs more color resolution than simple editing, with just cuts and dissolves, so the camera format must be chosen to match the job.

Editing

An NLE has a different set of constraints. The CPUs, computer backplane, network interfaces and disk controllers all present potential bottlenecks when handling HD video. If many video layers are to be handled simultaneously on regular workstations,compression is needed, hence Avid's DNxHD and Apple's ProRes. These formats allow current workstations to handle HD editing with ease, something that would not be possible with uncompressed HD.

Just as camcorders wait for advances in flash memory, there will come a time when NLEs can handle HD on affordable platforms. Today, however, these compression formats specially developed for editing applications offer good video quality, but at reduced bit rates.

Still, benefits include the necessity for less storage, and the bandwidth demands on storage networks are lower.

Playout

In playout, a different set of constraints comes to the fore. For multichannel playout, large numbers of files have to be loaded to the play-to-air servers and moved between them to meet the demands of traffic and scheduling. This calls for files to be as small as possible within the given quality requirements. However, the files need no further processing, unlike the need for compositing and VFX in the edit suit, so long-GOP MPEG is a common format in this application, with data rates of 10Mb/s to 20Mb/s popular for SD and 50Mb/s and up for HD.

Archive

Most programs will be used again, so they must be encoded from the edit format to a suitable format for archiving to replace the old operation of “printing” to videotape. The long-GOP playout format is not the best choice for anything that is to be repurposed. Instead, an I-frame-only format is the popular choice, typically 50Mb/s for SD.

The archive format should be one that can be played in years to come, so an international standard would be a better choice than a proprietary format that may have been used for editing.

All these conflicting requirements lead to the less-than-ideal situation in which many codecs are needed for a complete broadcast workflow: One size does not fit all. A typical workflow would be a codec for acquisition, one for editing, one for archive and possibly many for distribution as HD, SD, Web, mobile, etc. The favorite format for Web is proving to be AVC (H.264).

So the process of coding, decoding and re-encoding becomes a necessary evil. The downside is the concatenation of encoding artifacts if the wrong transcodes are introduced into the chain. In a sensibly designed workflow, with due regard to the selection of data rates and codecs, the re-encoding should not cause overtly visible artifacts. Encoding as long GOP is one example of a final format, not an intermediate one, because the re-encode of GOP to I-frame only is not a transparent process, but can cause visible quality deterioration.

Not just the codec

Various equipment handles video and audio in different ways. The two are wrapped in a container that maintains sync between the tracks and stores the metadata necessary to control and manage the media files. Typical wrappers include the MPEG transport stream (MPEG-TS), GXF, MXF and QuickTime. Each has its applications and proponents; for example, MPEG-TS is popular for real-time transfer.

The audio and video also may be multiplexed together or interleaved — the MPEG-TS being an example of multiplexing and DV of an interleaved format.

A core part of any format conversion is the ability to rewrap a codec. It may not even be necessary to decode, so a QuickTime MPEG file could be rewrapped as MXF MPEG at the same data rate. The different wrapper just provides compatibility with various processing applications.

The audio may also be compressed, but usually this only happens in the final encode to the delivery format at the station output. Audio can be handled as WAV or BWAV files.

Video format conversion

International program exchange requires another format conversion, between line and frame rates for SD and between frame rates (25/29.97) and possibly interlace/progressive conversion for HD (720p/1080i). For final distribution, it will be frequently necessary to downconvert an HD master to SD.

Transcoding

For many of the format conversion operations, a transcode between different codecs will be unavoidable. This is usually accomplished by decoding to “baseband” and then re-encoding in the new codec, but this process will incur a quality loss. In some limited applications, it is possible to “transrate,” which is to lower the bit rate of an MPEG signal by processing in the compressed domain. This process results in less quality loss than a re-encode. (See Figure 1.)

Program reuse

A program may be pulled from the archive for reuse or it may be repurposed, perhaps with some minor editing or addition of new material. There is unlikely to be the same compositing that was used in the original edit. This means that there can be a compromise, in that the archive format can be a lower bit rate than the editing format. The program may be edited as 10-bit DN×HD, and then converted to 8-bit MPEG I-frame for archiving. It can still be edited with no loss, but keying will suffer from the lower bit depth.

Sampling conversion, from 4:2:2 to 4:2:0, is generally only needed at the final stage where the distribution format for the viewers is created.

For distribution, many formats will be needed. Even for broadcast, the video could be encoded as variable bit rate as part of a statistical multiplex for terrestrial or satellite broadcast and constant bit rate for cable and IPTV operators.

Codecs in the workflow

To avoid undue picture quality deterioration, there are rules to be followed when designing a production workflow. The first rule is to not upconvert the resolution. Shoot, edit and archive in the highest-resolution format that will be used for distribution. The second rule is not to re-encode to a higher data rate: It is a waste of bandwidth and storage. The quality is limited by the lower data rate encode, and it is not going to improve.

There are many other tips, such as don't use interlace if the output is progressive and don't upsample the color sampling (4:2:0 to 4:2:2). From these rules comes a pattern: The data rate decreases through the workflow. The only place that rule is broken is in the camcorder, where the signal is most likely more compressed than in the edit.

A broadcaster will have different demands from various departments. News will acquire, edit and archive in different formats than will entertainment. For many broadcasters, the SD world was simple. You could shoot, edit and deliver in one format, Digital Betacam or DVCPRO.

In the multiresolution world of HD to mobile, it is just not possible to use one format throughout the process chain. Rewrapping and transcoding have become an essential part of the video workflow. How and where this takes place in the broadcast chain must be carefully considered to avoid picture quality loss, yet still reduce the cost of storage and video networks. There is no single answer to the choice of formats, and the optimum choice is constantly changing as technology advances.

However, decisions made today will impact the program archive, and a poor decision now may well be regretted in the future.

And what of the future? Will storage cost so little that video can be stored uncompressed? Network bandwidth would have to increase in step to make this possible. The requirement for storage will also grow beyond the current need for HD with developments like UHDTV and high-dynamic range (HDR) video.