Although MPEG-2 continues to enjoy widespread deployment over broadcast and DVD media, MPEG-4 is rapidly catching up, thanks to advances in algorithms, processing power and storage. And while the original video codec of MPEG-4 — called Part 2 or Visual — delivered the ability to perform object-based coding, the feature was quickly superseded by the efficiency improvement provided by Advanced Video Coding (AVC), i.e., MPEG-4 Part 10 (ITU H.264).
MPEG-4 provides a more enhanced toolkit than its predecessors, yielding up to HDTV resolution with as much as a 50 percent reduction in bandwidth. Some of the key features of MPEG-4 are listed in Table 1.
MPEG-4 Visual first introduced the concept of video objects and video object planes. In contrast to coding an entire picture, a background image can be coded over which a foreground video object plane is separately coded. One typical application of this is in synthetic or computer-generated images (e.g., animation), where objects routinely move in front of a fixed or slowly moving background image.
In fact, a sophisticated encoder can make use of this technique when coding live-action video as well, with scene-detection algorithms capable of detecting moving foreground objects such as people and faces. However, while this type of coding has relevant applications for video conferencing systems, its widespread use has not grown, perhaps in part because of the amount of processing horsepower needed.
AVC introduces various new picture coding tools for interlaced pictures, such as picture-adaptive frame/field coding and macroblock-adaptive frame/field coding. These allow several choices: coding each field separately; combining two fields together and coding them as a frame; or combining two fields into a frame while splitting pairs of vertically adjacent macroblocks into either field macroblocks or frame macroblocks. AVC thus enables different macroblocks to be coded as either field or frame within the same picture. In addition, the prediction method for each coded macroblock can be one of several different types, such as intra, inter, forward, backward and bidirectional. Macroblocks can also be partitioned into smaller horizontal and/or vertical submacroblocks.
While MPEG-2 uses fixed anchor frames to predict the intercoded frames, AVC introduces the concept of multiple reference frames for interprediction. A reference frame index identifies one of several lists of reference frames so that predictions can efficiently account for back-and-forth scene cuts and object occlusions. (See Figure 1.)
AVC also adds error resilience tools that can help to reduce the visibility of artifacts when errors occur. These include additional picture and slice types, such as switching intraframes (SI-frames) and switching predictive frames (SP-frames), and flexible macroblock ordering. - and SP-frames (or slices) are coded to allow efficient switching between different pictures, allowing correct pictures to be built up in a decoder (even when there is missing information) by using a choice of different reference frames (or slices). SI- and SP-frames can also be used to facilitate bit stream switching.
AVC also expands on the way that the various symbols, such as motion vectors and residual data (transform coefficients) are coded, by adding techniques for variable length coding (VLC). VLC is a reversible, lossless procedure for entropy coding that assigns shorter bit strings to frequent symbols and longer bit strings to less-frequent symbols. Three types of coding are available in AVC: Exponential-Golomb (or Exp-Golomb) coding, context-adaptive variable-length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC).
Exp-Golomb is the baseline entropy coding method of AVC, which relies on a single code word set that is used for all syntax elements except for the residual data. The fixed tables of Exp-Golomb, however, as well as those used in MPEG-2 and other codecs, do not allow an adaptation to the actual image transform statistics, which may vary over space and time, as well as for different source material and coding conditions. With CAVLC, intersymbol redundancies are exploited by switching VLC tables for various syntax elements, depending on already-transmitted coding symbols. In this way, CAVLC is more efficient than the MPEG-2 VLC.
To encode pixel data, AVC uses a simpler transform than the discrete cosine transform (DCT) used in MPEG-2. The new transform (and quantization) can be carried out using integer arithmetic and a single multiply per coefficient, enabling a reduction in complexity. Transform coefficients are then coded using run-length and level. If there are runs of zeroes in the transform (as is commonplace), the level of the preceding coefficient is coded, along with the number of zeroes afterward. AVC goes past MPEG-2 in additionally coding the number of trailing ones (T1s). From a statistical standpoint, most blocks of video contain relatively little detail information. Therefore, the level (magnitude) of non-zero coefficients tends to be higher at the start of the transform array (near the DC coefficient) and lower toward the higher frequencies. CAVLC takes advantage of this by adapting the choice of the VLC lookup table depending on the number of non-zero coefficients and trailing ones in neighboring blocks.
The CAVLC method, however, cannot adapt to the actual symbol statistics (i.e., the statistical occurrence of the symbols themselves). CABAC adapts to the statistics of the actual bits in the data stream by using a context model that describes the probability of occurrence of one or more bits of a data symbol. This model may be chosen from a set of available models, depending on the statistics of recently coded data symbols. An arithmetic coder (a form of VLC) then encodes the data according to the selected probability model. Finally, the selected context model is updated, based on the actual coded data. The system essentially learns the best way to code the symbols. (See Figure 2.)
CABAC compresses data more efficiently than CAVLC, but requires considerably more processing power to encode and decode. Consequently, it's not commonly used in low-cost architectures such as handheld devices. Nonetheless, experimental results have shown the superior performance of CABAC compared to CAVLC. To evaluate the quality of digital video processing, the peak signal-to-noise ratio (PSNR, a measure of the encoding error) of processed video signals can be calculated. For typical test sequences in broadcast applications, bit-rate savings can average 9 percent to 14 percent, at a range of acceptable video quality of about 30dB to 38dB PSNR.
An amendment to the MPEG-4 standard added the Fidelity Range Extensions (FRExt), forming the AVC High Profile. This includes high-performance characteristics such as 10-bit and 12-bit per sample quantization; 4:2:2, 4:4:4 and RGB color sampling; very high bit rates; and lossless coding of portions of video. Scalable video coding was also recently added to AVC, enabling the encoding of bit streams that include sub-bit streams at smaller temporal or spatial resolutions.
AVC support is required for all Blu-ray disc players, so very high levels of video performance are now available to viewers, but of course at a cost. Notwithstanding the improvement from AVC, with the digital transition about to leapfrog in just a few months, and HDTV sales still growing, don't expect MPEG-2 to fade away anytime soon.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to:firstname.lastname@example.org
MPEG-4 profiles Features Simple Visual Profile Similar to MPEG-2 coding Advanced Simple Visual Profile B-frames, global motion compensation and interlace Core Visual Profile Binary shapes (video objects) and B-frames AVC Baseline Low delay and lower processor load AVC Main Interlaced video, B-frames and CABAC encoding AVC Extended Error resilience tools AVC High Fidelity range extensions and high-quality, high-resolution formats for HDTV and digital cinema
Table 1. MPEG-4 adds new features to the compression toolkit.