AVC/H.264 encoding

Using 4:2:2 10-bit encoding has become the de facto standard for professional video because it is captured and transmitted over SDI, allowing the entire production chain to use at least a 10-bit signal. In broadcast contribution, however, encoders and decoders are still limited to 8-bit signals. As a result, picture information can get lost, and quality can suffer when transmitting video.

This article demonstrates the advantages of processing video in its native SDI format using AVC/H.264 4:2:2 10-bit encoding. AVC/H.264 has historically been used at low bit rates for distribution applications. The 50 percent efficiency gains over MPEG-2 allow increased channel density, wider distance reach and reduced transmission costs. However, there is a growing need for production and contribution applications with higher standards of video quality.

Introduction

HD contribution applications are varied but share some common characteristics, including bit rates from 20Mb/s to 60Mb/s and above; low to moderate end-to-end latency, typically less than one second down to 250ms; and the need to decode and re-encode video several times before reaching the viewer.

For more than 10 years, production and contribution applications have used the MPEG-2 4:2:2 profile, but early design stages showed the potential for AVC/H.264 to be an improvement over MPEG-2. All MPEG-2 features were included, with the notable exception of an easy transrating process. Still, the majority of today's AVC/H.264 encoders and decoders are limited to relatively low bit rates and lack specific tools mandated by production and contribution applications.

As illustrated in Figure 1, most current AVC/H.264 broadcast contribution systems are based on existing distribution encoders and decoders.

Handling only High Profile requires downscaling to 4:2:0 8-bit before encoding, and upscaling back to 4:2:2 10-bit after decoding. This approach is also limited to less than 30Mb/s, which impedes the highest video quality applications in HD.

As technology matures, many products are now implementing the High 4:2:2 Profile, which is a superset of High Profile with two new tools that avoid the downscale/upscale stages shown in Figure 1, and feature 4:2:2 processing and up to 10-bit pixel bit depth handling.

To show actual data versus theoretical upper bounds that could never be obtained in real time, three products were used to gather results: a contribution encoder for AVC/H.264 4:2:2 8-bit measurements, a real-time HD encoder for AVC/H.264 4:2:2 10-bit measurements and a software file encoder for MPEG-2 measurements.

Using PSNR metrics to evaluate quality

Peak signal-to-noise ratio (PSNR) measures the difference between the source and the decoded pictures of a video sequence.

A known problem with PSNR is a lack of correlation with the human visual impression. For example, the same PSNR of 30dB for one sequence could look very good, while another could look very poor. Thus, two PSNR measurements on two different sequences are almost meaningless when measuring video quality.

However, two PSNR measurements on the same sequence performed with PSNR-optimized configurations can reveal a lot. In this case, the encoder capable of providing the highest PSNR will also be able to provide the best video quality.

When evaluating coding efficiency, it is customary to use the PSNR of the luma component only. If chroma has to be taken into account, a combined PSNR metric is often used:

CombinedPSNR = 0.8*YPSNR + 0.1*UPSNR + 0.1*VPSNR

This metric gives a good idea of the overall coding efficiency while keeping the chroma visual importance.

Encoder configurations

AVC/H.264 encoders are configured either in High Profile or High 4:2:2 Profile. AVC/H.264 profiles below the High 4:2:2 Profile process the video as 4:2:0. Because SDI transports 4:2:2 signals, chroma components need to be subsampled vertically prior to encoding and upsampled after decoding. The intent was to simplify the design as well as lower the bit rate needed to transmit compressed video.

A side effect of this process is reduced chroma detail. This is usually not a problem, however, because the human eye is not very sensitive to color information.

Even though the AVC/H.264 standard allows for six possible locations for the chroma samples relative to the luma samples, only the standard MPEG location is widely used. As shown in Figure 2, two schemes are available to handle progressive and interlaced sources.

Artifacts introduced by 4:2:2 to 4:2:0 conversions

The AVC/H.264 standard does not precisely define how the chroma subsampling or upsampling should be performed, leaving this decision to manufacturers. Thus, there can be a mismatch between the downsampling filter in the encoder and the upsampling filter in the decoder. Misinterpretation of the progressive or interlaced nature of the video can lead to faulty decoding of the whole chroma plane.

Video quality has to be kept at the highest possible level to handle several encoding-decoding steps. A mismatch in the chroma sampling can introduce color degradations that worsen with each generation, including color bleeding, loss of color contrast and details, chroma displacement relative to luma, and creation of interlaced (or progressive) color artifacts on progressive (interlaced, respectively) pictures.

It should be noted that an interlaced (alternatively, progressive) chroma artifact might confuse encoders in the cascading process, significantly reducing their coding efficiency and introducing luma degradation.

Figures 3 and 4 give an example of such problems after only five generations. The only defect introduced was a mismatch in the chroma resampling filters — polyphase bicubic downsampler before encoding and simple tent upsampler after decoding.

4:2:2 processing

The only way to avoid those artifacts is to process the video in its original color format (4:2:2). This is possible using the AVC/H.264 High 4:2:2 Profile.

The drawbacks of encoding in 4:2:2 include a moderate bit rate increase (for a given quantizer) relative to 4:2:0 encoding. (See Figure 5.) This bit rate increase does not lead to a loss of video quality with the first generation; in fact, the perceived quality is roughly the same.

As shown in Figure 6, on the next page, objective measurements such as PSNR reflect this subjective impression.

10-bit versus 8-bit video compression

All AVC/H.264 profiles above High Profile encode pixels with a bit depth greater than 8 bits:

  • High 10 Profile: 8 bits up to 10 bits
  • High 4:2:2 Profile: 8 bits up to 10 bits
  • High 4:4:4 Predictive Profile: 8 bits up to 14 bits
  • High 10 Intra Profile: 8 bits up to 10 bits
  • High 4:2:2 Intra Profile: 8 bits up to 10 bits
  • High 4:4:4 Intra Profile: 8 bits up to 14 bits
  • CAVLC 4:4:4 Intra Profile: 8 bits up to 14 bits

The bit depth increase provides improved accuracy to the AVC/H.264 compression scheme, including motion compensation, intra prediction and in-loop filtering. Figure 7 illustrates the gains that can be achieved using greater than 8-bit processing (measured in 4:2:0 with an 8-bit source upscaled to 10, 12 or 14 bits).

Extensive experimentation demonstrates that the coding efficiency gains are highest on videos that contain shallow textures and low noise.

Figure 8 illustrates PSNR improvement gained from increasing the bit depth to 10 or 12 bits on relatively noisy, textured standard sequences.

These curves illustrate that the gain is smaller as the bit rate is reduced, but cannot be considered negligible, making this feature attractive for low bit rate applications.

Encoding in 10 bits can achieve a PSNR increase of more than 1dB on some natural sequences and measures an average of 0.25dB at 60Mb/s on a varied test set of broadcast HD sequences. This translates to an average savings of about 5 percent and up to 20 percent, while retaining the same video quality.

However, further testing shows that increasing the bit depth to 12 bits (or even 14 bits) provides a much smaller coding efficiency gain (up to about 1 percent in bit rate savings), but again, no loss greater than 8 or 10 bits.

Lastly, there is no relation between 10-bit encoding and the frame format. The advantages are the same whether the source video is HD, SD, progressive or interlaced.

Beyond coding efficiency improvements

A noteworthy benefit of 10-bit processing is perceivable gains in the reduction of three kinds of artifacts: contouring, banding and mosquito noise.

This gives a better aspect to plain surfaces and shallow textured areas (smoke, clouds, sky, sunset, etc.) because it slightly improves object edges. These impairments are otherwise difficult to reduce using traditional tools.

If the source is not too noisy and the plain area is not too large relative to the picture surface, lowering the local quantizer produces an effect close to the one achieved with 10-bit processing. This has several negative impacts, the most important one being a strong reduction of the coding efficiency and a degradation of the rate-control stability.

Another approach is to hide the defects by adding noise during the encoding process, but the amount of added noise needed to achieve the same visual improvement is significant. Even at high bit rates, this can lead to an unacceptable reduction in coding efficiency.

The gains are the result of increased accuracy in internal computations; 8-bit video sources also show improvements. Interestingly, the reduction of artifacts provided by 10-bit processing is perceivable even on standard (8-bit or dithered 6-bit) LCD panels.

AVC/H.264 for contribution applications

The AVC/H.264 High 4:2:2 Profile enables high maximum bit rates for the video coding layer:

  • 40Mb/s for 525i and 576i (Level 3)
  • 200Mb/s for 720p and 1080i25/30 (Level 4.1)
  • 200Mb/s for 1080p50/60 (Level 4.2)

HD encoding at about 50Mb/s provides quasi-transparency for the vast majority of broadcast content. However, measurements show that up to 150Mb/s (35Mb/s in SD) might be needed to achieve 43dB, which is a common definition of “true” transparency. The High 4:2:2 Profile bit rate capabilities can cover the full range of production and contribution applications, including those that require advanced archiving and mezzanine format support.

MPEG-2 versus AVC/H.264

Today, HD contribution is mostly performed with MPEG-2 using 422P@HL. This profile offers 4:2:2 processing but is limited to 8-bit pixel component bit depth. As illustrated by Figure 9, AVC/H.264 High 4:2:2 Profile offers important savings when compared to MPEG-2, even at the highest bit rates.

These HD examples allow us to draw some conclusions verified by subjective measurements:

  • AVC/H.264 offers a bit rate gain of roughly 50 percent, below 15Mb/s.
  • Above 30Mb/s, AVC/H.264 produces results comparable in quality to MPEG-2 with a 20Mb/s increase. For instance, MPEG-2 quality at 60Mb/s is achieved by AVC/H.264 at only 40Mb/s or less. At very high bit rates, this rate saving can sometimes be even greater.
  • Above the 50Mb/s mark, the quality provided by AVC/H.264 increases linearly with the rate, showing that most of the encoder “effort” is spent coding nonredundant information like noise. Because the human eye is not very sensitive to noise fidelity, most sequences look quasi-transparent above this rate.

Summary

There are significant advantages to using the AVC/H.264 High 4:2:2 Profile. Using 4:2:2 10-bit coding provides the most compelling solution for production and contribution applications.

Encoding with 4:2:2, 10-bit or a combination of the two will always present a gain over High Profile because all subjective and objective measurements exhibit a quality increase for the same bit rate.

In addition, the AVC/H.264 High 4:2:2 Profile offers important rate savings over MPEG-2 even at the highest bit rates, allowing the user to significantly lower transmission costs, keeping the same visual quality, or to greatly improve the video quality using existing transmission links.

This year will be a turning point for contribution applications as encoders and decoders exploit the full potential of High 4:2:2 Profile. Furthermore, relying on a highly standardized bit stream syntax guarantees products from different manufacturers to be easily interoperable.

Pierre Larbier is CTO at ATEME.