The need for digital video compression

Conventional analog composite systems use video information compression methods by restricting the bandwidth of the baseband luminance and chrominance signals to reflect the eye sensitivity to spatial and temporal picture details as well as using spectrum-saving transmission methods. Analog component video formats use similar baseband spectrum-saving methods but slightly wider bandwidth chrominance signals. The ITU R BT 601 4:2:2 video format specifies a luminance signal bandwidth of 5.75MHz and a chrominance signal bandwidth of 2.75MHz, slightly below the Nyquist frequencies corresponding to the related sampling frequencies but well above any analog composite signals such as NTSC, PAL or SECAM. After digitization, the three component digital signals are time division multiplexed into a 27Mword/s parallel bitstream and subsequently serialized. The serial bit rate equals 270Mb/s for a 10 bit/sample accuracy. While this signal can be comfortably distributed inside a teleproduction center, the high bit rate is unsuitable for transmission purposes or moderate cost and size digital videotape recorder applications and needs to be reduced. It is therefore necessary to compress the bit rate. Compression is usually affected by removing video signal redundancies.

The video signal redundancies

Redundancy is best described as unnecessary data carried by a video signal. Since these data are unnecessary, removing them will reduce the bit rate without necessarily affecting the picture.

Statistical data redundancy: Most images contain large amounts of identical or very similar pixels. Unchanging picture details repeated pixel after pixel and image after image constitute redundant information in a data stream. Compression systems exploit the fact that identical data need not be repeated and transmitted. The identification of unchanging pixel values within a frame or a sequence of frames is called decorrelation.
Psychovisual redundancy: Certain picture details are not perceived by the human visual system (HVS). These picture details can be altered (i.e. reducing the number of bits per sample) or removed, thus reducing the data rate, and will result in imperceptible errors in the reconstructed picture.
Entropy: The entropy is best described as the unpredictable in a picture that needs to be preserved in order to be able to reconstruct the original picture. Reducing the bit rate below the entropy value of the picture will result in the loss of a certain amount of information.

The human visual system (HVS)

Video signals are ultimately decoded and displayed for human observers. The human eye, in conjunction with the brain, constitutes a precise imaging system. It can operate under a wide range of light intensities, recognize colors and perceive picture contrast as a function of picture detail (spatial frequency) and light intensity. Picture width and height, as well as the viewing distance, determine the perception of picture detail. The visual acuity of the eyes depends on:

The luminance of the background: Visual acuity increases with the brightness level up to a limit of 340cd/m2 (100 foot-lamberts).
The contrast of the luminance and chrominance signals: Picture details are visible only if there is a significant difference between them and the background (high contrast). The sensitivity of the eye to luminance detail is higher than that of chrominance detail. The eye contrast sensitivity varies with the temporal frequency of the picture. At high brightness levels, flicker becomes perceptible.The HVS perception characteristics result in image redundancies in the spatial and temporal domain. These redundancies are taken into consideration by compression systems to help reduce the bit rate.
The spatial redundancies: Spatial frequency sensitivity: High frequencies (fine picture details) are less visible. Texture masking: Errors in textured regions are difficult to see. Edge masking: Errors near the edges are difficult to see. Luminance masking: The visibility threshold increases with the background luminance. Contrast masking: There is a reduced visibility of one image detail in the presence of another. Noise frequency: The HVS has a low sensitivity to high frequency noise.
The temporal redundancies: Temporal frequency sensitivity: Below 50Hz, flicker effects become noticeable. High brightness levels increase the flicker perception. Spatial frequency content: Low spatial frequencies reduce the eye sensitivity to flicker.

Data reduction techniques

Data rate reduction can be achieved using a combination of various tools. The aim is to achieve the bit rate reduction of the original signal to the minimum value that does not result in an unacceptable degradation of the picture quality level. The picture quality level is chosen for the intended application. A higher quality level is required for contribution signals (undergoing further processing in a studio) than for emission signals (direct-to-home broadcasts).

There are two complementary data reduction techniques used, namely the bit rate reduction and the compression. The bit rate reduction reduces the data rate by discarding superfluous or imperceptible information. The compression uses statistical and higher-order mathematical means to remove redundant information. Many “lossless” and “lossy” reduction techniques have been developed over the years.

Lossless techniquesData rate reduction is lossless when it allows the recovery of the original signal after decompression. It is a fully reversible process. Only modest compression ratios (<3:1) are achievable. Among the lossless techniques are: The blanking removal: Nonessential data in the horizontal and vertical blanking interval are removed without affecting the picture. The bitstream is reduced to the active (essential) picture area content. The Discrete Cosine Transform (DCT): The forward (in the encoder) and inverse (in the decoder) DCT process is totally transparent if the transformed frequency coefficients have a word length of 13 to 14 bits for an input signal with eight-bit word samples. With 11 bits or less the DCT process becomes lossy. The Variable Length Coding (VLC): Also called Huffman coding and entropy coding, it takes into consideration the probability of identical amplitude values in a picture and assigns short code words to values with a high probability of occurrence and long code words to others. The Run Length Coding (RLC): Generates special codes to indicate the start and the end of a string of repeated values. Only non-zero values are encoded along with the number (run) of zero sample values along the scan line.
Lossy techniquesThe data rate reduction is lossy when information is lost and the original image can only be approximately reconstructed. It combines several data reduction techniques to achieve considerably higher compression ratios (from 3:1 to 100:1) and is an irreversible process. The picture is degraded after decompression as a result of data rounding or discarding within a frame or between frames. Among the lossy techniques are: Subsampling: A very effective method of lossy data reduction. Subsampling is generally applied to chrominance signals resulting in such sampling schemes as 4:2:2, 4:1:1 and 4:2:0. A special video conferencing subsampling scheme, called Common Source Intermediate Format (CSIF), subsamples luminance as well as chrominance and is claimed to have a resolution similar to that of a VHS recorder. Differential Pulse Code Modulation (DPCM): This is a predictive encoding scheme that transmits the sample-to-sample difference rather than the full sample value. Requantization: A process of reassigning the available number of bits per sample in a manner that increases the quantizing noise of imperceptible (to the HVS) picture details. In addition, it controls the bit rate to avoid digital buffer overload and achieve a constant bit rate required by transmission and tape-storage media.

Taken separately none of these techniques can generate a significant data reduction. The success of the MPEG approach lies in the fact that it combines several techniques to achieve an efficient data reduction system. These techniques have resulted in the well-known intra-coded (I), predicted (P) and bidirectionally-coded (B) pictures approach which, if properly designed and implemented, results in good quality pictures at reduced bit rates. Essentially ITU R BT 601 4:2:2 quality pictures can be obtained with a 10Mb/s bit rate. Compare this with the original 270Mb/s bit rate!

Michael Robin, former engineer with the Canadian Broadcasting Corporation's engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw Hill.

Send questions and comments to: michael_robin@intertec.com