With the growing use of multiple video formats, a fundamental understanding of sampling rate conversion will provide insight into the quality and operation of format converters. However, before delving into format conversion itself, it makes sense to understand a few fundamentals of image processing and digital signal processing (DSP).
Digital signal processing
While the principles behind DSP have been known for more than half a century, it was not until the proliferation of digital integrated circuits in the '70s that hardware encompassing digital processing became practical. Since then, further advances in memory size and processing speed have made DSP hardware realizable and economical.
Although the topic of sampling is familiar to many readers, there are certain details of the process that may be elusive. It is well known that a signal must be sampled at a rate at least twice as high as the highest frequency component — Nyquist's theorem — or aliasing will result. However, the manifestation of this aliasing is not as well known, especially for video. For example, Figure 1 shows five cycles of a sinusoidal signal. Nyquist says to sample this at least 10 times over this interval.
When sampled at less than that, the resulting signal will have a frequency less than that of the original, meaning it has been aliased, and no amount of processing will retrieve the original signal. Information about it has been lost, and the alias will overlay the other components of the signal at that frequency.
Another way of looking at this is that the sampling process creates repeat spectra in the frequency domain, i.e., the original signal spectrum will repeat, centered at multiples of the sampling rate. Thus, the input signal must be band-limited so the repeat spectra do not overlap.
How do these artifacts appear on video signals, or images? One useful tool for analyzing image processing systems is the zone plate. (See Figure 2) This 2-D sinusoidal frequency sweep essentially covers all spatial frequencies in an image. The typical zone plate starts at zero frequency in the center and progresses linearly to a maximum frequency at the edges, ideally one-half the sampling rate. (The spatial sampling rate is the number of pixels per picture width and picture height. Note that for most video, the spatial sampling rate will be different in the two dimensions, as the aspect ratio is rarely 1:1.)
In an imaging system free from artifacts, the concentric circles will be uniform as they increase in frequency toward the outer edges. However, almost all imaging systems will produce aliases that will appear as repeats of the bull's eye pattern. With interlaced scanning and subsampled chrominance, the repeats become more complex. Today, a zone plate pattern is often available in digital signal generators, providing a tool to critically evaluate digital video systems.
Rate conversion challenges
When changing image resolution, the number of samples per image must be altered both in the horizontal and vertical dimensions. For example, let's start with a fictitious 1280 × 960 image and downconvert it to a 640 × 480 image. In this case, the sampling rate must be decreased both horizontally and vertically by one-half. If one were to simply throw away every other sample (a process called decimation) when the original image had frequency components out to one-half the original sampling rate, then this decimation would result in aliasing in the final image. Figure 3 shows this effect in the resampled image. While the difference here may seem subtle, keep in mind that with motion, the alias pattern will scintillate and be distracting.
It's important to realize that lowering the sample rate is, in effect, resampling the image. In order to satisfy the Nyquist criterion, it is necessary to first low-pass filter the image and then decimate the result. The key here is in how to do this filtering. Ideally, it should be a brick wall filter so no aliasing is produced and the maximum picture resolution is maintained. Quantization noise in the signal will also fold due to aliasing, with the noise floor increasing as a result.
However, the ideal filter has practical problems. It is impossible to realize such a filter without a long delay (due to the many calculations, or taps) and without causing a certain amount of ringing in the signal. Thus, we would like to approximate the behavior of such a filter, but put an upper bound on the artifacts that are produced. The simplest such filter is a linear interpolator, but this still allows considerable aliasing, as this process is a filter with an extremely slow roll-off. A better filter is the (sin x)/x filter, which provides a superior roll-off with a few more calculations. Keep in mind that this filtering must be done both vertically and horizontally, and often with different scaling factors. In addition, a conversion from progressive to interlaced scanning may be needed — but that's beyond this month's topic.
When converting from a lower to a higher resolution image (or when increasing the audio sampling rate), an upsampling process is used. In the case where an integer upsampling is performed, the process first involves zero-padding the samples, or adding a series of zero samples in between the original samples. In the frequency domain, this creates repeat spectra between the original signal and the new sampling rate. A low-pass filter is then used to smooth the signal, removing the repeat spectra.
The previous examples illustrate the process of changing the sampling rate by an integer factor. When a noninteger change is required, one of various methods can be employed. If a rational factor is needed (e.g., 3:2), then upsampling (in this case, by three) and downsampling (by two) can be cascaded. In practice, these steps are usually combined and performed at the same time, with a computational savings due to some samples being discarded.
Audio has its requirements, too
Rate conversion also applies to audio — with even stricter requirements. The ear is much more sensitive to distortions than the eye, especially when signals approaching pure tones, such as those of many musical instruments, are produced. In order to minimize audible artifacts, the lengths of the filters may be quite large. But with sampling rates much lower than those needed for video, the overall complexity (and cost) will be much less.
With audio signals, there may also be a more frequent need for asynchronous rate conversion (e.g., where the source and destination sampling rates are unrelated). One such case could be where the destination requires a specific sample rate, locked to a reference frequency, and the source sampling rate is determined by a free-running clock, as in a recorder. If the two rates are very close, one could simply lock the playback mechanism to the new clock. But this is only acceptable if the change in timing (and pitch) is negligible. Assuming one wants to keep the signal in its digital form, the conversion system may have to convert on the fly, with interpolation or decimation carried out as needed on a sample-by-sample basis. One such solution will let the two clocks slide, until a sample must be added or dropped to maintain synchronization. Depending on the approximate conversion ratio needed, different algorithms for interpolation (filters) can be used.
In future articles, we'll discuss how sampling rate conversion is used in format conversion, and how video artifacts arise from the different elements. For now, the principles described here should give a better appreciation of processes that can affect the final quality of your video and audio signals.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to:firstname.lastname@example.org