Compression basics

Figure 1. Formation of an 8x8 block. Click here to see an enlarged diagram.

The goal of video compression is to represent an image with as few bits as possible. Compression is achieved by removing the redundancies in the video signal. This article will deal with the MPEG-2 video compression concepts. The MPEG-2 standard was developed for the delivery of compressed television for home entertainment. It is a set of defined compression and systemization algorithms and techniques with well-defined rules and guidelines allowing variations in the values assigned to many of the parameters and providing for a broad range of products and interoperability. They are integrated into an MPEG “toolkit” or syntax, which addresses a variety of cost versus performance standards described in Levels and Profiles.

The MPEG tools

A further extension of MPEG-2, called the 4:2:2 profile, has been developed to record and transmit studio-quality video.

Figure 2. MPEG-2 4:2:0 video data stream architecture. Click here to see an enlarged diagram.

MPEG is best described as a collection of bit-rate reduction and compression methods that are available to the designer. These tools are:

Video stream data hierarchies

Discrete cosine transform (DCT): DCT is a lossless, reversible mathematical process that converts spatial amplitude data into spatial frequency data. As shown in Figure 1, the image is divided into blocks of eight horizontal pixels by eight vertical pixels (8×8 block) of luminance (Y) and corresponding color-difference (CB and CR) samples. A block of 8×8 pixels is transformed into a block of 8×8 coefficients describing the amplitude at a particular frequency. The upper left corner represents the DC component. Moving across the top row, the horizontal spatial frequency increases; moving down the left column, the vertical spatial frequency increases.Essentially, the signal is converted into one value for the DC component and 63 values for 63 frequencies, a process equivalent to a spectrum analysis. The video signal has most of its energy concentrated at DC and the lower frequencies of the spectrum. The DCT process results in zero or low-level values for some or many of the higher spatial frequency coefficients. It in itself does not result in a bit-rate reduction. The DCT process merely converts the source pixels into a form that allows an easier compression. The coefficients are read out in a zigzag fashion, starting with a DC and ending with the highest frequency.
Figure 3. Conceptual block diagram of an intraframe (spatial) coder. Click here to see an enlarged diagram.
Requantizing: This lossy process assigns more bits to low-frequency coefficients and less bits to high-frequency coefficients. In addition, it can be used to maintain a constant bit rate if necessary.
Run length coding (RLC): The process of quantizing results in non-zero coefficients followed by a string of zero values. The RLC transforms this sequence by sending a unique code word instead of a long string of zeros, thus reducing the bit rate.
Variable length coding (VLC): This process allocates short code words to frequently occurring values and long code words to infrequently occurring values.
Buffer: The buffer helps achieve a constant bit rate, as required by recording and transmission of data.

Figure 4. Conceptual block diagram of an interframe I/P encoder with motion compensation. Click here to see an enlarged diagram.

The MPEG-2 data structure is made up of six hierarchical layers:

Block: Luminance and chrominance data are separated in 8×8 blocks of Y, CB and CR values.
Macroblock: A macroblock consists of four blocks of 8×8 values in a window of 16×16 pixels of the original picture and their associated CB and CR values. The number of chroma blocks in the macroblock depends on the sampling structure (4:4:4, 4:2:2 or 4:2:0).
Slice: A slice is made up of several contiguous macroblocks.
Picture: A picture consists of a group of slices and contains information needed by the decoder.
Group of pictures (GOP): A GOP is made up of a sequence of various combinations of pictures.
Video sequence: A video sequence includes a sequence header, one or more GOPs and an end-of-sequence code. Figure 2 shows the makeup of a video sequence.

Figure 5. Conceptual block diagram of an IPB encoder. Click here to see an enlarged diagram.

The MPEG compression scheme results in three types of compressed pictures:

MPEG picture types

The intraframe coded picture (I): An I picture does not depend on information from other pictures. Only the spatial redundancies are removed. I pictures provide only moderate amounts of compression. Figure 3 shows a conceptual block diagram of an I compression scheme.
The interframe coded picture (P): The interframe compression reduces both the spatial and the temporal redundancies to increase the efficiency of the data compression. Figure 4 shows a conceptual diagram of an I/P compression scheme. The output of the spatial coder feeds a spatial decoder, which consists of an inverse REQ (IREQ) and an inverse DCT (IDCT), which reconstruct the predicted (past or I) picture. A fixed store memorizes and delays the I picture and feeds the motion estimator. The motion estimator compares the I picture with the present picture to create forward motion vectors. The I picture is shifted by these vectors to generate a predicted P picture. The predicted P picture is subtracted from the real (present) picture and produces a forward prediction error, which feeds the spatial coder (DCT and REQ).In the motion compensation block, vectors are calculated that best predict the present picture. Because frames may be different in various manners, the prediction may not be perfect. If there were no motion and no other changes, the present could be perfectly predicted and the difference frame output would be zero, which would be easy to compress. When the two frames are different, the difference frame can still have much less information and will be easier to compress.

Table 1. Maximum constraint parameters for MPEG-2 levels and profiles (*frames per second). Click here to see an enlarged diagram.

The output of the spatial coder feeds the VLC and the RLC. A multiplexer (MUX) combines the compressed data with the motion vectors and feeds the buffer. The buffer generated rate control ensures that the bit rate at the REQ output will not cause buffer underflow or overflow. The REQ feeds the buffer with quantizing table information for use by the decoder. To create an I picture, the video input feeds the DCT directly. After the I picture is created, the DCT is fed the predicted frame.
Bidirectional coded pictures (B): A new B picture from the input contains predictable information present in the I and P pictures, as well as unpredictable (discovered) information. Figure 5 shows a conceptual diagram of an IBP compression scheme. The motion compensator compares the B picture with the preceding I or P picture and the P picture that follows it to obtain bidirectional vectors. Forward and backward motion is used to generate several predicted B pictures. These are subtracted from the current picture. The resulting forward and backward data are switch selected depending on which of the two are nearer to reality. The picture differences are spatially coded in the usual manner and feed the VLC, RLC and the buffer.
The IPB sequence: The I, P, B frame coding results in a GOP starting with an I picture, followed by a sequence of P and B pictures. The P pictures are formed using previous I or P pictures as a reference. The B pictures use both past and future pictures as a reference. The MPEG algorithm allows the encoder to choose the frequency and location of I pictures.In applications where random access is important, I pictures are used twice every second. The encoder also chooses the number of B pictures between any pair of I or P pictures. A GOP is no more than 15 pictures long, starting with an I picture and finishing with a B picture. A typical arrangement of I, P and B pictures is shown in Figure 5 in the order in which they are displayed. The MPEG encoder reorders the pictures in the video stream to present the pictures to the decoder in the most efficient sequence.

MPEG-2 offers a wide choice of parameters, resulting in millions of possible combinations. The concept of “profiles” and “levels” serves to restrict the choice of parameters. The restrictions affect the choice of the picture size (horizontal pixels × active lines), the frame structure (I,P,B), the maximum data rate and the sampling structure. The choices offered allow for standard-definition (720×576 or 720×480) as well as HDTV formats (1280×720 or 1920×1080). Table 1 summarizes the constrained parameters.

Profiles and levels

Michael Robin, a fellow of the SMPTE and former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill and translated into Chinese and Japanese.

Send questions and comments to: Michael_robin@primediabusiness.com

Recommended reading