Digital Audio 101

Digital signal processing is often a mystery to many people, but the underlying principles are fairly straightforward and within the grasp of anyone with a technical background. In this feature, we’ll take a look at the theory and practice behind digital signal conversion, sampling, sample rate conversion, and quantization noise, especially as they relate to digital audio. Although some of these principals might be familiar, we’ll dig a bit deeper into the processes and offer a more in-depth understanding of the technology. Remember that the basics of signal processing also apply to any digital signal, including video.

A/D and D/A conversion

In order to convert an analog signal to digital form, sampling and quantization must be performed. Sampling is usually performed by a sample-and-hold circuit that takes brief, high-speed repetitive snapshots of the analog signal. In order to prevent sampling errors, which would cause a sin(x)/x rolloff in the frequency response of the system, the sampling period must be extremely (in theory infinitesimally) short. Quantization resolves the signal into a specific number of voltage levels. This process is defined by the number of bits needed to resolve the signal. With audio signals, these samples are usually 16-bits wide, allowing a resolution of 216 levels or 65,536 different levels. (With video, 8- or 10-bit per color sampling is usually used.)

Quantization and digital conversion are implemented by one of several different means, including flash conversion or successive approximation. In the former, a series of comparators evaluates the input voltage in relation to a set of fixed reference voltages. In the latter, a high-speed counter and D/A converter compare an estimate of the output with the input. The result of any of these methods is to generate the string of digital words that represent the samples of the signal.

The digital-to-analog conversion is much simpler. Usually, the individual bits of the sampled signal each contribute to a weighted ladder network of resistors in a simple summing circuit. The most significant bit contributes the most weight, and each successive bit contributes one-half of the previous one. The D/A converter is then followed by a low-pass post-filter that removes the sampling frequency and repeat spectra.

Sampling and quantization

The difference between a quantized signal and its original is a low-level signal known as the quantization error or noise, as seen in Figure 1. The level and characteristics of the quantization noise are defined by the type of quantizer, the quantization resolution and the sample rate.

A well-known property of quantization noise is that the signal-to-noise ratio (SNR) in dB for a full-scale sine wave is defined by the following formula: SNR = 6.02B + 1.76, where B is the number of bits. This formula is true only for sine waves, however, as it depends on the distribution of the signals. For other signals, similar formulas apply, but ordinarily each added bit of resolution cuts the noise level in half, or by 6dB.

According to Nyquist’s theorem, signals must be sampled at a rate at least twice as high as the highest frequency component. Otherwise, aliasing will occur, which is essentially a folding of the spectrum. In the time domain, this can be seen in Figure 2 on page 60. With the five cycles of sinusoidal signal shown, Nyquist states that the signal should be sampled at least 10 times over this interval. Because the signal was sampled at less than the Nyquist frequency in the example, the resulting sampled signal will have a frequency lower than that of the original, i.e., it has been aliased.

Figure 3 shows the frequency spectrum of the signal, i.e., a graph of the amplitude of the signal components versus frequency. (The spectrum of the sine wave signal in Figure 2 would consist of a single vertical line at the frequency of the signal, and that would not be very illustrative. Hence, the use of wideband signal with energy distributed over the spectrum in the figure.)

In the frequency domain, sampling has the effect of refl ecting and repeating the baseband spectrum around multiples of one-half the sample rate (fs/2 in the fi gures). (For ideal sampling, the repeat spectra repeat indefinitely; in practice however, realistic sampling will generate a series of repeat spectra that continue and eventually fall off. It is the function of the analog low-pass filter to remove all of these higher frequency components.) In the frequency domain when the wideband signal is sampled below the Nyquist rate, the signal is aliased in the neighborhood of frequency fs/2, where the components overlap, and cannot be fully recovered in any subsequent processing.

One last point: If a signal is sampled at exactly twice the highest frequency, then the signal is said to be critically sampled .

However, an interesting technique can be employed to successfully subsample a signal below Nyquist. This is because the usual form of the theorem applies to baseband signals that have frequency components (essentially) all the way down to DC. Actually, the general form of the Nyquist criterion states that signals must be sampled at a rate at least twice as high as the bandwidth of the signal, and if the lower edge of the signal stops considerably above DC (called a bandpass signal), then this allows us to sample at less than two times the highest frequency. This is shown in Figure 4.

This subsampling can be combined with a sample-rate downconversion and filtering operation. The result, when dealing with multiple subbands, ensures that the amount of digital data does not increase — an important consideration when dealing with compression efficiency. If a wideband signal was split into bandpass signals and the result was not decimated, a data increase would result because the sample rate actually determines the amount of digital data.

To change the sampling rate of a signal, the signal can be upsampled or downsampled. When an integer upsampling is needed (also called interpolation), the process first involves zero stuffing the samples, i.e., adding a series of zero samples in between the original samples, as seen in Figure 5 on page 60, where the signal is being upsampled 2x. A low-pass filter is then used to smooth the signal, resulting in the signal shown in Figure 6.

In the frequency domain, zero stuffing has the effect of shifting the sampling frequency to a higher set of repeat spectra. (See Figure 7.) A low-pass filter is used to remove the repeat spectra. A simple repetition of samples is not satisfactory for upsampling, as it will create alias components that will not be removed by the final low-pass filter. Straight linear interpolation will also create aliases, though at a lower level than with sample repetition.

Downconversion is performed by a similar process. In order to satisfy Nyquist (and avoid aliasing), it’s necessary to first low-pass filter the signal and then decimate the result. For example, to downsample the signal 2x, it must be low-pass filtered at fs/4, and then every other sample is discarded. Ideally, it should be a brickwall filter, so that no aliasing is produced — but ideal filters are expensive and have long delay, so some amount of residual high-frequency components is sometimes tolerated. The process in the the frequency domain is shown in Figure 8.

When a rational rate change is required (e.g. 3/2), upsampling and downsampling can be cascaded. Through clever designs, these conversions are often performed at the same time, minimizing the number of computations. When an asynchronous rate conversion is needed (where the source and destination sampling rates are not integrally related), interpolation or decimation can be carried out on a sample-by-sample basis. However, while this allows for the most versatility, fixed low-pass filters cannot be used, as there is no integral relationship between the sampling rates. Therefore, some kind of cubic spline or similar operation will be performed on the neighboring samples in order to minimize aliasing .

Quantization revisited

Long ago, a clever observation was made that the quantization SNR applies over the entire bandwidth of the digital signal — and hence, a signal that is oversampled will have a lower noise floor within the usable bandwidth. Thus, it is possible for a 1-bit converter with 16x oversampling to have the equivalent audible SNR as a critically sampled 16-bit converter! Such a design is often used in consumer digital equipment, as a 1-bit D/A converter is trivially easy to implement. It is essentially just a low-pass filter — the D/A and post-filtering are both done at the same time.

Another benefit of oversampling is that the first repeat spectrum above the baseband signal will be much higher in frequency than it would have been if the signal were critically sampled. Compare, for instance, the top and bottom signals in Figure 7. The bottom signal can have a simpler and less-expensive low-pass filter that takes an entire octave to roll off above the highest component of the signal, whereas the top signal requires a filter that rolls off quickly before fs/2.

Class dismissed

There you have it; you’re now an expert in digital signal processing. The next time you have to deal with signals at different sample rates — or even video at different resolutions — you should have a better idea of what’s going on inside conversion equipment and what differentiates amateur components from professional gear.