Codecs and workflow: Making the optimum choice

Sony’s recent release of multi-codec cameras (the F5 and F55) is a reminder that there is no such thing as the ideal codec. Not only do the cameras support RAW and existing codecs, but Sony introduced XAVC codec to give additional capability and 4K compressed recording.

The choice the Director of Photography (DoP) uses ultimately depends on a number of issues particularly the program genre. An observational documentary is less demanding on final picture quality than prime-time drama.

In an ideal world one codec would be used from camera to master control. That would avoid concatenation of coding artifacts. However in most applications it is just not practical. In any workflow there are touchpoints:

camera files
editing files
delivery broadcast transmission master
playout server files

Each of these may use a different codec. For example: shoot HD at MPEG-2 4:2:2, 50Mb/s, long GOP; edit and finish at DNxHD 145, deliver as HDCAM-SR and ingest to the playout server as 25Mb/s 4:2:0 long GOP at 20Mb/s. This is decoded to HD-SDI for the master control switcher and keying, and then finally encoded to HD at a nominal 6Mb/s. variable bit rate in a statistical multiplexer.

HD VTRs

The first generation of HD camcorders based on videotape had a choice of DVCPRO100 or HDCAM recording. WIth data rates of 100 and 140 Mb/s respectively not only was compression a necessity, but the images were down-sampled from 1920 to 1440 line lengths to lower bandwidth. HDCAM-SR allowed recording at 440 or 880Mb/s, and resolved many of the quality issues.

Before considering the compression format, what is being compressed?

Camera Data

How much data does a sensor create? Consider high frame rate recording at 60fps (progressive). A three chip, HD camera (1920 x 1080), with a 16-bit sample, generates 99.5Mb per frame, a stream of 6Gb/s or 746MB of data every second.

A 4K (3840 x 2160) single-chip Bayer array generates 133Mb per frame or 8Gb/s.

Few sensors have a bit depth of 16, so these numbers can be reduced for the 14 bits typical for cameras, but that is still 5.2Gb/s for the 3-chip HD camera.

Single Sensor Color Imaging

As DoPs demand super 35mm sized sensors, more cameras are now using the single chip design. A three- chip camera with super 35mm sized sensor would be very large. It would also have a long back focus, meaning that regular cine lenses would be unsuitable. It is more cost-effective to use a single sensor with a color filter array (CFA) for large sensor designs. However, it is accepted that the resolution of a single sensor with a Bayer CFA, with a good demosaicing algorithm, is around 30 percent lower than a 3-chip imager of the same pixel dimensions.

For an HD camera , a single chip camera needs a resolution of around 2300 x 1300 to equal a 3-chip 1920 x 1080 imager. It should be noted that many single chip cameras have sensors exceeding this size, so should achieve good resolution in HD.

The single chip camera can ease the issues of moving data to the post house. If the raw data from the chip is recorded, rather than the demosaiced RGB or YUV signals, then there can be considerable saving in data. Table 1 shows some numbers for a 14-bit sensor shooting 60P. A three sensor design outputs 5.2Gb/s against only 1.74 Gb/s from a RAW sensor. Note the RAW signal is a lower bit rate than a sub-sampled 4:2:2 8-bit signal.

Table 1 Format W H bit-depth bits per frame Mb/s @ 60fps HD 444 1920 1080 14 87,091,200 5,225 HD 444 10-bit 1920 1080 10 62,208,000 3,732 HD 422 10-bit 1920 1080 10 41,472,000 2,488 HD 422 8-bit 1920 1080 8 33,177,600 1,990 HD 420 8-bit 1920 1080 8 24,883,20 1,492 RAW HD 1920 1080 29,030,400 1,741

In most cameras, the sub-sampled YUV signal is truncated to 10 or 8 bits then encoded using one of the MPEG standards at 1080i 25/29.96 or 720P 50/59.94. I-frame or long GOP. For example AVC-I at 100Mb/s and XDCAM at 50Mb/s. Such data rates are similar to existing SD rates, so easily handled by networks and storage. Converting from linear to log sampling helps, saving two bits for a given dynamic range, with 12-bit log considered equivalent to 10-bit linear. Many digital cinema shoot use log coding.

The goal of one codec

But what of the target of using one codec all the way through to master control?

To improve the performance of their NLEs, Apple and Avid developed edit-specific codecs: ProRes and DNxHD. For good performance the NLE requires the minimum rendering to display clips on the timeline. The editor’s work will be slowed for a codec that requires considerable rendering. Moore’s Law will fix the rendering issues over time, but 4K waits in the wings with four times the data rate. Edit workstations always seem to be short on power!

Adding the edit codec immediately introduced two transcodes on the input and output of post.

The ARRI Alexa can encode to DNxHD and ProRes with suitable options installed.

One route popular with camera manufacturers is to include onboard recording in a compressed, 8-bit. format, and supply an uncompressed output for one of the many after-market recorders. Some of these recorders now support encoding direct to DNxHD and ProRes, neatly circumventing the need for coding or transcoding during ingest to the NLE.

Different Needs

Most camera codecs use one of the ISO/IEC MPEG family: MPEG-2, MPEG-4 part 2 or AVC. Some DSLRs record motion JPEG to ease the internal processing requirements. To meet the conflicting requirements of productions for different genres, camera manufacturers are offering more choices. These include:

RAW output
uncompressed, RGB or YUV, log and linear, 12, 10 or 8 bits
compressed

The next step is the introduction of high-efficiency video coding (HEVC). Most likely this will first appear as a delivery codec, to stream 4K to home viewers and lower resolutions to mobile devices.

Looking at all this, it looks like a multi-codec camera is going to add choices to support different workflows. One camera can create large files for grading, or small files for a straight-though edit. One camera can be used for different production genres. If a camera has an uncompressed output, HD-SDI or HDMI, then after-market field recorders achieve much the same aim.

Although one codec from acquisition to the air server works for news and documentaries, but for other genres that require a higher picture quality, then transcoding becomes a necessity.

A director has all the choices available, but choosing the right one needs a joint decision from the DoP, digital intermediate technician, colorist and editor to achieve the optimum picture quality within the many constraints of production, not least being cost.