Digital video processing

In this column earlier this year, we mentioned “light-to-light,” the concept of an overall transfer function describing a complete camera-to-display video system. We then zeroed in on sampling and reconstruction, two elements of this cascade of signal processing elements. This month, we'll look at this concept from a holistic perspective, to gain insight as to how the various elements interrelate.

Characterization of video systems

In systems analysis, the behavior of a system with an input and output can be characterized by several means. With audio systems, it is customary to describe the frequency response of a system, i.e., the gain of the system from input to output as a function of the frequency of the signal applied to the input. In mathematical terms, this gain is a complex function that can be evaluated in terms of magnitude and phase, and in this form is generally called the transfer function of the system.

With video systems, a similar concept applies, wherein the input and output “signals” are comprised of two-dimensional (or more) images. In fact, as video is concerned with moving images, the signal can be considered as having three dimensions: vertical, horizontal and temporal. Breaking down the image into luminance and chrominance components adds a “dimension” as well. The complete description of a video system, from input image to output (display) image can thus be characterized by one or more transfer functions, and we are predominantly concerned with the amplitude aspect of those.

Transfer functions can be linear or nonlinear, meaning that the input-output amplitude relationship at any one frequency can be plotted as a straight line or as a curve, respectively.

When dealing with linear systems, transfer functions are commutative, i.e., cascading different systems will result in an overall transfer function that does not change when the order of the different systems is changed. But video systems tend to be nonlinear, as the transfer functions depend on many different characteristics of the images and often will vary over time as well.

The typical elements of a complete video system are shown in Figure 1. A number of these processes can take place simultaneously, i.e., in real time, or can operate at different points in time, such as upon retrieval from a storage medium. Also, some of the elements shown can be omitted or interchanged in order; one example is a totally analog system, which would not contain a digitizer, or compression, or the inverse functions.

Lens system

We start with the optics of the video camera; the concept of transfer function applies to optics as well as to electronics. In the optical domain, we are generally concerned with the optical transfer function, which includes magnitude and phase. Because phase is usually of less interest when a wide spectrum of light is considered, with video systems we are more often interested in the modulation transfer function (MTF), which concerns amplitude only. The MTF describes the performance of an optical system as a function of spatial frequency, measured in cycles per unit angle, and provides a way to characterize the maximum spatial resolution of the system. Other measures of the behavior of optics include monochromatic and chromatic aberration, which are spatial distortions of the ability of an optical system to form well-focused and color-registered images.

Imager system

State-of-the-art cameras use CCDs or CMOS sensors to capture images. These devices inherently perform the sampling function, i.e., they convert a continuously valued image into a discrete spatially sampled representation. While most professional cameras use three imagers and a color separation system of optical prisms, other video cameras use a color separation Bayer filter that is incorporated in a single imager. The performance of an imager is defined by many characteristics, including maximum resolution (in two axes), minimum sensitivity, signal-to-noise ratio and linearity.


Imagers usually do not digitize the signal, i.e., convert the amplitude values into quantized digital values. That function resides in the digitizer, also called an analog-to-digital converter. (There are some imagers, however, that integrate the digitizer on the same device.) The Nyquist Theorem states that information signals must be sampled at a rate at least twice as high as the highest frequency component; otherwise, aliasing will result. The combination of imager and digitizer will determine the resolution, aliasing performance and pixel rate of the video.

Color encoding usually includes chrominance subsampling to reduce signal bandwidth, with contribution and distribution signals typically using 4:2:2 sampling, and transmission signals using 4:2:0 sampling; this subsampling will introduce its own aliasing. Color space modification can also be integrated into this process.

Post processing consists of a number of image transformation functions that provide special effects and image improvement. The simplest of these are luminance and color correction, which adjust for poor levels from cameras and make video consistent across content, and peaking, which applies a high-pass filter (both dimensions in a digital system) that accentuates image detail. Peaking alone can emphasize noise in the image, so noise reduction should accompany its use, by intelligent coring and/or filtering. These techniques lower noise by adaptively quantizing and changing select spatial frequency bands of the image, depending on the distribution of luminance and chrominance values. Format conversion is another processing element that involves a resampling of images in both the spatial and temporal dimensions, with subsequent changes to image resolution and aliasing.

Compression is used to lower the transmission bit rate and/or storage requirement of digital video. While the process can be lossless, i.e., the input/output transfer function approaches a “straight wire connection,” high compression factors are commensurate with image distortion, generally causing a number of known artifacts, including blockiness and quantization noise. Modern codecs like MPEG4-AVC/H.264 also include loop filters that reduce blocking artifacts but may cause loss of subjective detail or objective fidelity (accurate pixel reconstruction). The effect of all of the preceding system elements on the performance of the compression system should not be underestimated. Poor processing and interfacing of signals in both analog and digital form can cause an increase in artifacts and a decrease in the efficiency of the compression system.

Compressed bit streams can be modified by numerous methods of bit-stream processing, such as recoding for multiplexing, splicing to switch program content and logo insertion. Storage and transmission, which in the ideal are lossless processes, can nonetheless delete information through errors in the media. While storage is usually designed in a way that makes errors nonexistent, transmission errors can result in irretrievable data corruption and loss.

The decoding side

The processes on the decoder side can be distributed among various devices or completely integrated in a display. Decompression is usually deterministic, within the accuracy of the arithmetic operations. Thus, any two decoders implementing the same fixed-point (or floating-point, when so encoded) mathematical operations should produce the same output. However, there may be some savings in processor power dissipation, memory size, etc., by taking some approximation shortcuts in the decoding algorithm, which can have an effect on output picture quality.

Color decoding is typically the straightforward inverse of the encoding process. Processing on the decoder side provides image improvement and allows user preferences, including sharpening (peaking) and noise reduction. Spatial down-resolution can occur either within this processing or as part of the decompression process. The D/A (digital-to-analog converter), while ideally providing an inverse to the digitizer function, always requires a low-pass filter (LPF) that can potentially degrade high-frequency response, i.e., the ability to reproduce image detail.

The display system attempts to construct a faithful reproduction of the original image. With most consumer displays based on either a large LCD or plasma panel, or a small projection panel (the latter usually involving optics), the display can become part of the final digital-to-analog conversion process. While signal reconstruction in the system sense usually requires a low-pass filter, and one will be used when driving analog video outputs, some digital display panels eliminate the D/A converter and LPF entirely. An optical spatial filter is usually absent in panel displays due to its complexity and the ability of the human eye to integrate the pixels into a continuous image.

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments to: