Since its introduction in the early 1990s, the MPEG-2 video-compression standard has enjoyed exclusive dominance in the broadcasting world. And during this decade, its performance has improved in several ways. The standard now has a lower bit rate, and its implementation has been enhanced by techniques such as advanced motion estimation, pre-processing, dual processing and adaptive allocation of bit rate through statistical multiplexing.
But MPEG-2’s performance cannot be enhanced indefinitely. Today’s consensus is that the MPEG-2 standard is reaching its practical limits. At the same time, the demand for compression is growing. Broadcasters and other communications providers are offering an increasing amount of content while introducing on-demand and personalized services over cable, satellite and telco infrastructures.
Because of this, codec users need a new compression standard to pick up where MPEG-2 leaves off. One of the leading candidates is MPEG-4 AVC, also known as H.264, MPEG-4 part 10, H.26L or JVT.
Figure 1. This block diagram is a simplified representation of the steps that MPEG encoders use to perform temporal and spatial compression. The dotted line represents an additional data path used in MPEG-4 AVC spatial compression. Click here to see an enlarged diagram.
Like other codecs, MPEG-4 AVC compresses video by reducing the temporal and spatial redundancies found in video images. Temporal redundancies are picture similarities that repeat from frame to frame, such as the nonmoving background in a talk show. Spatial redundancies are similarities that appear within a frame, such as a large number of similar pixels that portray a uniformly blue sky. Figure 1 is a simplified representation of the steps that MPEG encoders take to perform temporal and spatial compression.
Select mode, divide and conquer
The encoder begins by deciding what type of frame it wants to compress at a particular time and selecting the appropriate encoding mode. The “intra” mode produces an “I” frame; the “inter” mode produces a “P” or “B” frame. Then, the encoder divides the hundreds of rows and columns of pixels that make up the incoming frame of uncompressed digital video into smaller blocks, each containing several rows and columns of pixels.
When the encoder is operating in the “inter” mode, the block undergoes motion compensation. This process detects any motion that has occurred between the current block and its counterpart in one or more previously stored reference frames, and creates a “difference” or “error” block. This effectively reduces the data in each block to just a representation of its motion. From there, it goes to the discrete cosine transform (DCT) to begin spatial compression. When the encoder is in the “intra” mode, the block skips the motion-compensation step and goes straight to the DCT.
Blocks usually contain pixels that are similar or even identical to their neighbors. In many blocks, the pixels frequently don’t change much — if at all — from one to the next. This means that in many blocks, the frequency of changing pixel values within the block is low. Such blocks are said to have low spatial frequency. The encoder takes advantage of this characteristic by first converting the block’s pixel values into frequency information in the discrete cosine transform process.
Discrete cosine transform. The DCT process transforms the block’s pixel values into a grid of horizontal and vertical frequency coefficients located in frequency space. When the original block has low spatial frequency, the DCT clusters most of the frequency energy at the low-frequency corner of the grid. The few low-frequency coefficients located in this corner of the grid therefore have high values. The numerous coefficients on the remaining portion of the grid are high-frequency, low-energy coefficients and have low values. At this point, most of the information describing the original block is now contained in the DC coefficient and a few low-frequency coefficients. This means that the encoder can eliminate most of the remaining high-frequency coefficients without degrading the image quality of the block. The encoder prepares the coefficients for this process by scanning the grid, starting at the DC coefficient and working its way diagonally in a zig-zag fashion through the grid’s increasing horizontal and vertical frequency coefficient locations.Thus, it generates a sequence of coefficients arranged by frequency.
Quantization and entropy coding. This is where the actual spatial compression takes place. Based on a scaling factor (which the encoder can adjust), the quantizer “rounds off” all the coefficient values. Since most of the coefficients coming from the DCT are high-frequency coefficients having low values, the quantizer rounds most of these values to zero. The result is a sequence of quantized coefficient values that begins with a few high values at the beginning of the sequence, followed by long runs of coefficients that are quantized to zero. The entropy coder keeps track of the number of consecutive zero values in a sequence without having to encode them, thereby reducing the amount of data it must send for each sequence.
MPEG-4 AVC spatial-compression advantages
MPEG-4 AVC offers two improvements in spatial compression. First, this type of encoder can perform spatial compression on 16x16-pixel macroblocks instead of the 8x8 blocks in earlier MPEGs. This significantly improves its ability to perform spatial compression on images with large areas of similar pixels. Second, this compression occurs in the spatial domain before the DCT process. MPEG-4 AVC compares the current macroblock with its neighbors within the frame, calculates the difference, and sends only the difference to the DCT. Alternatively, it can subdivide the 16x16-pixel macroblock into 4x4-pixel chunks and compare each of these with its neighbors within the macroblock. This significantly improves its ability to compress detailed images.
Figure 2. MPEG-4 AVC can subdivide the luminance macroblocks in several ways to enhance motion compensation. Click here to see an enlarged diagram.
MPEG-4 AVC temporal-compression advantages
The biggest improvement offered by MPEG-4 AVC is in inter coding. Advanced methods in this mode produce a much higher level of temporal compression and better motion quality than prior MPEG standards. Block sizes. In the inter mode, MPEG-2 supports only 16x16-pixel macroblocks, which does not provide enough resolution to accurately encode complex or nonlinear motion such as zooming. By contrast, MPEG-4 AVC enhances motion compensation by allowing the encoder to vary the size of the luminance component of each macroblock. (The encoder uses the luminance component in this way because the human eye is much more sensitive to luminance movement than it is to color movement.) As Figure 2 shows, MPEG-4 AVC can split up the luminance component of each macroblock in four ways: 16x16, 16x8, 8x16, or 8x8. When using the 8x8 blocks, it can further split each of the four 8x8 blocks in four ways: 8x8, 8x4, 4x8, or 4x4. Splitting macroblocks allows the encoder to handle several types of motion according to the motion complexity and the bit-rate resources. In general, large partition size is appropriate for handling motion in the bland (homogenous) areas of the frame, whereas small partition size is useful for handling motion in more detailed portions of the frame. The result is better viewing quality, with less blockiness and fewer artifacts. Tests indicate that good tiling of the frames can yield compression saving of more than 15 percent. Click here for an example of a macroblock division on the luminance portion of the image. MPEG-4 AVC takes the luminance portion of the original image and uses subdivided macroblocks in high-detail areas to enhance motion compensation.
Figure 3. Comparison of MPEG-4 AVC to MPEG-2 quality and bit rate. Click here to see an enlarged diagram.
Motion-compensation accuracy. In most cases, motion at the borders of each macroblock, block or sub-block occurs in resolutions smaller than one pixel. Therefore, MPEG-4 AVC supports motion-compensation accuracy down to one-quarter or one-eighth of a pixel, in contrast with prior MPEG standards based primarily on half-pixel accuracy. MPEG-4 AVC is adopting eighth-pixel accuracy as an additional feature for increased coding efficiency at high bit rates and high video resolutions. Tests show that quarter-pixel accuracy can reduce the bit rate more than 15 percent compared to single-pixel accuracy.
Multiple reference-picture selection. MPEG-2 relies on only two reference frames to predict periodic motions, such as that of a carousel. But, when cameras change angles or switch back and forth between scenes, using just two reference frames is inadequate for good motion prediction. Likewise, more than two reference frames are necessary to predict complex motion such as waves and explosions. For these reasons, the MPEG-4 AVC standard allows up to five reference frames for coding inter frames. The result is a better subjective video quality and higher compression efficiency.
Integrated de-blocking. Compressed digital video typically produces a “blockiness” effect, which is evident at the meeting points between blocks, especially at low bit rates. This effect is due to processing with different quantizers and motion types. In MPEG-2, the only way to prevent blockiness is by using proprietary post-processing mechanisms that are not compatible with all receivers. MPEG-4 AVC defines a de-blocking filter that operates in two levels: 16x16 macroblocks and 4x4 block boundaries. De-blocking typically gives a lower peak signal-to-noise ratio (PSNR), but, in terms of subjective viewing, it offers a higher-quality viewing experience.
Transform and quantization advantages
The floating-point 8x8 DCT with rounding-error tolerances forms the core of the earlier MPEG standards. MPEG-4 AVC is unique in that it employs a purely integer spatial transform (an approximation of the DCT) for 4x4-pixel chunks. The smaller shape reduces blocking artifacts, while the precise integer specification eliminates mismatch issues between the encoder and decoder in the inverse transform. In addition, a wider range of quantization scaling factors, based on a compounding rate of approximately 12.5 percent rather than a constant increment increase, gives flexibility to the encoder’s data-rate-control mechanism.
Entropy coding advantage
After performing motion compensation, transform and quantization, earlier MPEG encoders map the symbols that represent motion vectors and quantized coefficients into actual bits. For example, MPEG-2 uses static variable-length coding (VLC), which is not optimized for real video environments where content and scenes vary with time. MPEG-4 AVC, on the other hand, uses context-adaptive binary arithmetic coding (CABAC). CABAC offers superior coding efficiency by adapting to the changing probability distribution of symbols. For example, it can exploit correlation between symbols and adaptively use the bit correlations and arithmetic coding. This mechanism yields additional bit savings of more than five percent.
Table 1. Average bit rate savings using MPEG-4 AVC compared to other standards. Click here to see an enlarged diagram.
Figure 3 gives a graphical comparison of MPEG-4 AVC and MPEG-2 quality. It compares the performance of state-of-the-art encoders encoding a 30-frame-per-second, CIF-resolution video of a tennis match.
Table 1 shows the bit rate savings of MPEG-4 AVC compared to several existing standards. MPEG-4 AVC represents a watershed in compression, implementing enhanced techniques for better bandwidth use and increased quality. Using these techniques, MPEG-4 AVC can reduce the bit rate by more than 50 percent compared to the MPEG-2 standard. But MPEG-4 AVC requires a much higher level of complexity in both encoding and decoding. Nonetheless, continual improvements in hardware and software processing power make this challenge surmountable, which means that MPEG-4 AVC is a viable candidate for replacing MPEG-2 in the coming years.
Mario Rainville is associate vice president of product marketing, and Amir Segev is a system architect, at Scopus.