File-based acquisition

The file-based environment has become highly desirable, and there are very compelling reasons for this. Once we break away from real-time transfers, then we open up the possibility of new workflow where content moves swiftly from place to place.

The reality of the situation is rather different, because it is not quite as simple as it seems. We all know what an analog or SDI signal looks like, but when you get into file-based systems, there are many different varieties of files.

Variations and standards

One leading asset management system has more than 180 different flavors in its library. This is certainly not even all the possible permutations of video format, codec, bit rate and wrapper.

Some of these variations are designed for different applications. A self-contained QuickTime sequence, for example, has three variants. MXF uses the frame wrapper, which contains a video frame with its associated audio, then the next video frame and audio, and so on. That makes it ideal for playout, where you might want to start and stop at any point within the sequence.

But it is challenging to write in a low-power device, so many ENG cameras use the clip wrapper, which writes all the video frames in a shot followed by the audio tracks. This is also supported by MXF.

There's also a compromise version, the mixed wrapper, which has a block of video frames followed by the audio, then the next block of video frames, and so on. Final Cut Pro usually produces QuickTime in this form.

Standards would certainly help in this area. MXF was the original attempt, but it has not yet been fully successful, largely because the original incarnation was drawn so widely that files generated by one device could be “MXF compliant” and utterly incomprehensible to another vendor's “MXF-compliant” system.

The work done by the consortium driven by Turner Broadcasting has positioned MXF as a more useful format, but it is still not universally recognized.

There are other standards in the pipeline, too, such as MPEG-7, which defines technical metadata in the media and how systems can store information automatically from derived technical information. True interoperability — of the kind we had when BNC to BNC or XLR to XLR always worked — depends on the widespread adoption of standards.

Until then, systems integrators are faced with rewrapping and transcoding between different devices and various stages of the content pipeline. These processes take time and can cause degradation of the signal. Both are blocks to the seamless workflow that is the promise of the file-based environment.

The best-case scenario would surely be to avoid transcoding anywhere in the system, but that is relatively unfeasible. For now, we have to acknowledge that the acquisition codec, the editing codec and the transmission codec are going to be different and optimized for each area.

Acquisition

It is a fundamental principle understood by broadcasters that quality lost at the beginning of a production chain can never be recovered, so acquisition should use the best possible codec.

If we ignore the constraints of file-based systems, probably the best practical acquisition format for broadcast video is 10-bit 4:2:2.

So why, then, would we compromise on that quality just because it is being recorded as a file rather than onto linear tape? Advancing technology should not mean deteriorating quality.

Content captured at 10-bit 4:2:2 HD can be acquired using AVC-Intra or JPEG2000. Quantization at 8 bits, or even more drastic color subsampling, will produce visibly inferior image quality and will cause problems in green-screen work.

The difference between AVC-Intra and JPEG2000 is that the first is an MPEG-type codec based on discrete cosine transforms (DCT), and the latter uses wavelet compression. AVC-Intra can also use temporal compression, using information from one video frame as the basis of other frames in a group of pictures, whereas JPEG2000 is an intraframe compression scheme in which each individual frame is complete in itself.

The DCT algorithms implemented in MPEG and AVC-Intra split the picture into blocks and process each individually. When the encoder or decoder comes under stress, these blocks can become visible. Wavelet encoding, on the other hand, processes the entire picture as a single entity (up to 4K by 4K resolution in JPEG2000), so blocking is impossible. And if there is any stress on the compression engine, the result is a less obtrusive softening of the picture.

One consideration we have to make when selecting the compression algorithm is that of basic science. The more effective the compression, then the more processing is required. Because Moore's Law helps us with more power in smaller chips, this is less of an issue, but it needs to be in mind when balancing considerations in the choice of a camera. Better quality means more processing, which means a bigger (heavier) battery and potentially more heat to be managed.

Data rates

In the article “Tapeless camcorders” in the January issue of Broadcast Engineering World, the editor opined, “Camera operators have come to expect a record duration of around two hours from a single cassette, again setting an expectation for different media.” That is an important point. We cannot force a reduction in efficiency just because we are moving toward a file-based approach. So bit rates have to be balanced against storage capacities to ensure that expectations of recording durations are met.

There are camcorders that support a number of codecs and bit rates, allowing the operator to choose the right tool for the job. The best possible acquisition quality is 10-bit 4:2:2 HD encoded using JPEG2000 to 100Mb/s.

As a simple rule of thumb, 100Mb/s is about 1MB per minute of storage. On a 65GB disk, that comfortably exceeds one hour of storage; likewise, with two relatively inexpensive 32GB professional CompactFlash cards, this benchmark can be achieved.

At this point, the issue moves away from one of bit rates to a question of storage. In the days of SD digital television, people got very blasé about file sizes and built workflows around ingesting everything and sorting it later. In HD, users need to be more careful.

A one-hour drama that has a 10:1 shooting ratio — not atypical — has 450GB of raw data, which is not in itself a huge problem; it is common for PCs or Macs used for editing to have local storage measured in terabytes. But you have to move that content in and out, usually across a network, which is also busy with other traffic.

If there is another device in the way to flip the content from one codec and/or wrapper to another, it can add another significant time penalty. News is the most time-sensitive content, and that can usually exist with a lower bit rate codec (although the need to transcode will still be an issue). But you might want to create a highlights package from a prestigious sporting event, in which case waiting for a format flip could be a big problem.

One solution is to move away from ingesting everything toward a preselection process. Offline editors could browse the lower-resolution versions that are created as a part of JPEG2000 encoding to browse for the best takes.

An important workflow boost is to use an editor that can access the camera material in its native format. Using the CineForm intermediate codec, for example, you can keep 10-bit 4:2:2 HD intact and import it straight into Final Cut Pro. The editor can accept a large number of codecs, including 100Mb/s JPEG2000, directly into the timeline.

These two solutions can then be combined, if the acquisition media can be connected directly to the editor. The content can then be browsed directly from the acquisition media with only the required takes pulled into the editor, and then transferred to a central archive as a background task as network traffic allows.

Conclusion

This article has focused on top-end HD acquisition and file-based workflows, because that is the most tested environment. HD production is about quality, so it's important to capture and deliver the best possible quality.

The same lessons apply to other applications. News, for example, may trade off some quality for smaller file sizes and editing speed. It is still critical, though, to ensure that the whole workflow is clear and without bottlenecks. File-based systems should make life easier, not more complicated.

There may be a case, for example, where a real-time HD-SDI transfer between two devices is actually quicker than passing a file through a format flipper. It may not be as “sexy” as a file-based workflow, but it might get the job done.

In planning for file-based environments, the acquisition formats have to be the primary consideration. This sets the quality standard for the rest of the network, and it should be maintained as far along the pipeline as possible. With today's technology, that bar can be set very high indeed, with 10-bit 4:2:2 HD and excellent compression algorithms now readily available in practical and affordable camcorders.

Nigel Arnott is marketing development manager for Infinity at Grass Valley.