Compression toolkit

Many video operators probably take the operation of the various processing devices in the video chain for granted, especially as they affect the video program material. In fact, many of the operating parameters of these devices can be changed, but it takes a bit more knowledge of the processes to anticipate the final effects and trade-offs. This month, we'll look at the tools available to the “compressionist” and how they affect the video.

Compression fundamentals

As is well known, systems such as MPEG and VC-1 lower the bit rate of a video signal by means of compression, which involves removing certain redundant characteristics of images. And lossy compression exploits certain perceptual characteristics of the human visual system, whereby errors are allowed to form in ways that are generally less noticeable by the viewer. To understand how this affects the video, see Figure 1.

Preprocessing is the first function used in an encoder, and it happens to be a lossy one, as the information discarded by this process cannot be faithfully retrieved. After conversion to digital form, the input signal is typically scaled to a desired format, and the chrominance is subsampled, usually to the 4:2:2 or 4:2:0 sampling structure.

4:2:2 means the chrominance is horizontally subsampled by a factor of two relative to the luminance. 4:2:0 means the chrominance is both horizontally and vertically subsampled by a factor of two relative to the luminance.

In order to provide maximum quality, 4:2:2 (or even 4:4:4, which means no chrominance subsampling) is most often used for video editing and storage operations. 4:2:0 is typically reserved for final distribution and transmission.

This reduction in spatial sampling rate is the first contributor to lowering the overall output bit rate. This is a nonlinear contribution. For example, cutting the vertical sampling in half does not simply cut the number of bits needed in half because of the variation of spatial redundancy in the video. Filtering of the input signal — possibly in both the horizontal and vertical directions — will have a contributory effect on the total bit rate, with an associated effect on the picture quality.

Noisy pictures will cause the encoder to waste bits by coding the high-frequency energy in the noise and will tax the performance of the motion estimation function of the encoder. More on that later, but for the moment, remember that it pays to provide clean video — or the use of a good noise reduction processor — when encoding.

DCT and quantization

The next encoding steps involve the discrete cosine transform (DCT), quantization, run length coding (RLC) and variable length coding (VLC). The RLC and VLC lower the bit rate by cleverly removing redundancy from the code word stream, losslessly, to within the accuracy of the arithmatic operations.

The DCT, however, does not alone result in any bit rate reduction. The output of the DCT contains just as much information as the input of the function. However, the transform is needed to convert the spatial video information into frequency information that can then be perceptually compressed by means of quantization. By acceptably lowering the quantization of DCT coefficients, this core function lowers the bit rate of the video.

The DCT, RLC and VLC functions are tightly defined in most compression systems and therefore cannot be directly modified by the operator. However, the quantization is continually adjusted by the encoder, and the tables used for weighting the different DCT coefficients can sometimes be changed indirectly by the operator, especially in software encoders.

In general, these tables tend to quantize the higher frequency components more coarsely than the lower frequency ones. The characteristic trade-off is thus bit rate for fidelity of (and lack of artifacts in) visual details.

Motion estimation, interframe coding and GOP structure

The other key video compression technique exploits the temporal redundancy in pictures — the fact that most of the time, considerable sequences of pictures are repetitive and similar. Motion estimation and interframe coding take advantage of this by transmitting only the differences between successive frames and by coding motion vectors that provide an efficient way of accounting for the small motion-related differences that do occur.

One factor that can trade off processing time for video quality is that of the motion estimation search range, or the distance over which the encoder must search to find best-matching blocks. Larger search ranges will take longer to process, but can result in better video. This parameter is sometimes available to the operator.

Closely related to motion estimation and interframe coding is the length and structure of the group of pictures (GOP) in an MPEG sequence. Typically, 15 frames form a GOP, which is comprised of I (intra), P (predicted) and B (bidirectionally predicted) pictures. A longer GOP will produce a lower average bit rate, but it will also produce a longer end-to-end delay, which therefore increases the startup time (channel-change or acquisition) in a decoder. Typically, the relative compressed sizes of these I-, P- and B-frames can be expected to be about 10:5:2 respectively; of course, this will vary depending on the encoding parameters used and the actual video material. The number and position of P- and B-frames can also be changed (or eliminated completely) with associated changes in delay, artifacts and bit rate.

Rate control and its effects

Perhaps the most readily available video compression control is the final bit rate. While the ultimate rate may be defined in some applications, certain parameters controlling this rate are nonetheless adjustable. Because the instantaneous bit rate changes from frame to frame, a buffer is required at the output of the encoder to smooth this behavior into a constant rate. This buffer then provides information to a rate control mechanism that affects the quantization of subsequent video. This feedback mechanism, when poorly implemented (or highly challenged), can result in “pumping” or “breathing” of the video, noticeable as GOP-related cyclical quality changes in the resulting video.

The MPEG-2 standard defines a hypothetical buffer model, called the video buffer verifier (VBV). (See Figure 2.) This is a model used by the encoder to predict the filling and emptying of the output buffer, to assure that the decoder buffer never overflows or underflows (empties). Such a condition would cause pictures to be lost or to freeze. The MPEG standard defines the VBV delay as the time needed to fill the VBV buffer from its initial empty state at the target bit rate (R in Figure 2) to the correct level immediately before the first picture is removed from the buffer.

In addition to the actual (physical) buffer used in the encoder, the rate control can specify the size of the VBV, essentially a software parameter. A smaller VBV, while leading to a smaller end-to-end delay, will cause the rate control to allow fewer bits per picture, resulting in a decrease in picture quality.

The bit rate can be defined as constant, as shown in Figure 2 (and typically used in all broadcast transmissions), or variable. The latter can result in better instantaneous picture quality, but usually requires some other kind of mechanism to prevent buffer violations. One such application is the DVD, where the data playback essentially can be sped up or slowed down as needed.

Of related interest is the statistical multiplexer, or statmux, which is used to simultaneously encode multiple program sources with the goal of combining these into one program stream, such as when multicasting. The technique essentially uses a shared buffer model that is a composite of each of the separate individual buffers. If done poorly, there can be visible artifacts from the different channels affecting each other. However, when done well, and for a large number of channels, the result can be the ability to squeeze in more channels than would be possible by individual encoding alone.

If you have the opportunity to modify some of these parameters, you should experiment and see how it can improve your operation. Just make sure you put in a lot of time and critical viewing before you commit to any change to your on-air operation!

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments