Audio sampling

For the most part, the mechanisms of the natural world around us, including sound, operate in the analog domain. And so the transducers used to convert sound into electrical signals (microphones) and vice-versa (loudspeakers) are analog as well. A microphone produces an electrical signal with an infinite number of amplitudes which can be amplified to a suitable level for further processing such as mixing, recording, transmission and reproduction. The chain of devices that amplify, process and carry the electrical signal from the source (the microphone) to the destination (the loudspeaker) — and the wires that connect them — can be thought of as a medium. Unfortunately, the devices in this medium introduce inherent, undesirable impairments (linear distortions, nonlinear distortions and noise) that degrade the quality of the signal. The impairments contributed by each device are additive — that is, they accumulate. Thus, the number and individual performance of the devices in the medium determine its overall performance. This puts a limit on the number of devices through which an analog audio signal can pass before the impairments become unacceptable.

However, it is possible to eliminate many analog signal-handling difficulties by digitizing the electrical signal before sending it through the medium. Digital audio systems convert the original analog signal to a binary digital signal which has two well-defined states: zero and one. Undesirable electrical impairments affect the digital signal just as they affect the analog signal, but they have no effect on the information the digital signal carries as long as the device receiving the signal determines that the binary signal levels are within the threshold values for the “zero” and “one” states. Such systems restrict message distortion to the analog-to-digital (A/D) and digital-to-analog (D/A) conversion processes, thereby improving the transparency of the medium. The medium remains transparent as long as it maintains a certain level of signal-to-noise ratio (SNR), beyond which the “cliff effect” occurs and the transmission shuts off. This article examines some of the basic audio analog-to-digital conversion concepts, emphasizing the sampling process.

Sampling considerations

Sampling is the first step towards digitizing audio signals. It consists of measuring the amplitude of the analog audio waveform at periodic intervals, T. The main concern is to represent the original analog values with adequate precision. The measurement accuracy depends on the frequency at which the audio signal is measured, or sampled. Here's where Nyquist and Shannon have their say. The sampling frequency must be at least twice (preferably more than twice) the highest audio frequency being sampled.

The sampling process consists of multiplying the analog audio signal with a stream of repetitive pulses — a pulse amplitude modulation (PAM) process. Figure 1 represents this process in the time domain and Figure 2 represents it in the frequency domain. In this idealized case, the sampling frequency (Fs = 1/T) is considerably higher than the sampled frequency and the sampling pulse duration is close to zero.

Telephone audio signals with a maximum frequency of about 3.4 kHz are sampled at 8 kHz. Some early British approaches to carry digitized television audio on the tip of the horizontal sync pulses used a sampling frequency of 32 kHz with a maximum audio baseband of 15 kHz.

For historical reasons, CD recordings use a sampling frequency of 44.1 kHz. In the early 1980s, when CDs made their appearance on the market, the U-MATIC videotape format was adapted to record digital audio. A special adapter processed the digitized audio into a 3 MHz bandwidth analog video signal, which could be recorded on U-MATIC tape. For the process to work, it was necessary that there be a relationship between European 625/50 scanning frequencies, North American 525/59.94 scanning frequencies and the speed of the rotating record head. The adopted 44.1 kHz audio sampling frequency met this requirement so long as the North American playback VCR ran at 525/60 instead of 525/59.94. Sampling 20 kHz bandwidth audio signals at 44.1 kHz yielded a small frequency gap that permitted the use of a reasonably priced and reasonably performing anti-aliasing filter. Even so, many purists claim that a 20 kHz low-pass filter gives rise to overshoot, ringing and related audio distortions, which, to some, are unacceptable. For this and other reasons, studio audio operations are carried out at a 48 kHz sampling rate. Using both sampling frequencies in a studio environment requires sample-rate converters. In addition, in a television studio, the audio sampling frequencies have to be coherent (derived from the same master clock) with the video sampling frequencies.

Quantizing considerations

The next step in analog-to-digital conversion is quantization. In this process, the samples are assigned a binary number approximating their sampled value. Quantizing divides up the sampled voltage range into 2n-1 quantizing intervals, where “n” is the number of bits per sample (the sampling resolution). For example, an 8-bit system can identify 28 (256) discrete sampled signal values (255 quantizing intervals). The amplitude of such a signal can occupy the entire quantizing range. However, low-amplitude audio signals would be quantized with considerably fewer discrete levels, resulting in significant quantizing errors. These quantizing errors are correlated with the signal and perceived as distortion. With higher-level signals, the quantizing errors are uncorrelated with the signal and perceived as random noise. One can reduce quantizing errors by increasing the number of bits per sample, increasing the sampling frequency (oversampling), or both. Early digital audio equipment (e.g., CD technology) uses 16 bits (65,535 quantizing intervals). Current high-quality studio equipment uses 20 bits (1,048,575 quantizing intervals) or 24 bits (16,777,215 quantizing intervals) per sample. The formula expressing the SNR of a digital audio system is SNR (dB) = 6.02n + 1.76 + 10 log10 (Fs/2Fmax), where n is the number of bits per sample, Fs the sampling frequency in Hz and Fmax the maximum baseband frequency in Hz.

An audio signal is bipolar, which means that it deviates above and below a “midway level” of 0 V. The signal's amplitude indicates by how much the positive and negative excursions deviate from the midrange value, and a digital representation of the signal must take this into consideration. One method, called offset binary, shifts the range of the converter such that the positive and negative voltages of an audio signal are represented by binary numbers that are only positive. A similar approach is used in component digital video, where the bipolar B-Y and R-Y analog color-difference signals are shifted up by 350 mV. The offset binary approach creates difficulties in some digital audio processes, such as mixing, because it can lead to an overflow, which is unacceptable. An alternate approach, called the “two's complement,” allows for negative values. Figure 3 shows a 20-bit example of two's-complement binary coding. Here, zero starts at midrange and the maximum positive signal value is 7FFFF(HEX). The maximum negative value is 80000(HEX). The most significant bit (MSB) indicates the sign, “one” for negative and “zero” for positive.

Michael Robin, former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill.

The Second Edition of Michael Robin's book may be ordered directly from the publisher by calling 800-262-4729. It is also available from several booksellers.

Send questions and comments to: michael_robin@intertec.com