Digitizing audio

The world around us is analog. Our perception mechanism is also analog. Standard audio transducers, such as the microphone and the loudspeaker, are analog devices. Sound perception of humans occupies about 20 octaves, extending from 20Hz to 20kHz.

In an analog system, the infinite number of discrete electrical signal amplitudes that the microphone produces (the information) is amplified to a suitable level for further processing, such as mixing, recording, transmission and reproduction. The signal processing is essentially a transmission medium that carries the original signal from the source (the sound captured by the microphone) to the destination (the listener).

The medium inherently introduces undesirable electrical signal impairments (linear distortions, nonlinear distortions and noise), which have a direct effect on the reproduced audio quality. These impairments are additive, and the overall performance of a complete analog chain depends on the individual performance and number of discrete components assembled in a typical operational configuration (the medium). This puts a limit to the number of stages that an analog audio signal can pass through before it becomes too impaired to be acceptable.

Many analog signal handling difficulties can be eliminated if the analog signal is digitized prior to modulation and transmission. In a digital audio system, the original analog information is converted to a digital representation. The analog-to-digital conversion consists of two processes: sampling and quantizing. The resulting digital information is in binary form. Essentially, the digital electrical signal has two well-defined states: zero and one.

Undesirable medium-generated impairments affect the digital electrical signal in a manner similar to the one affecting the analog signal. They have, however, no effect on the information as long as the receiver can distinctly recognize the two levels. The result is that the message distortion is restricted to the analog-to-digital (A/D) and digital-to-analog (D/A) tandem process, thereby improving the transparency. The transparency is maintained as long as the SNR is within some medium-related values beyond which the cliff effect occurs, and the transmission shuts off.

The sampling process

Sampling is the first step towards digitizing audio signals. It consists of measuring the analog audio waveform amplitude at periodic intervals, represented by T in the formula that follows. The main concern is to represent the original analog values with adequate precision. The measurement accuracy depends on the sampling frequency. As stipulated by Nyquist, the sampling frequency has to be at least twice, preferably higher, the maximum audio frequency.

The sampling process requires multiplying the analog audio signal with a stream of repetitive pulses. This results in a pulse amplitude modulation (PAM) process. Figure 1 represents this process in the time domain, and Figure 2 represents it in the frequency domain. In this idealized case, the sampling frequency (Fs=1/T) is considerably higher than the sampled frequency, and the sampling pulse duration is close to zero. Early digital audio, as used for telephone communications, assumed a 30Hz to 3.4kHz audio bandwidth and used a sampling frequency of 8kHz.

For historical reasons, CD recordings use a sampling frequency of 44.1kHz. Sampling 20kHz bandwidth analog audio signals at 44.1kHz requires a 20kHz low-pass filter between the analog input and the A/D converter. Well-designed filters avoid interference between the baseband audio and the sampled PAM spectrum, preventing aliasing. Even so, many purists claim that a 20kHz low-pass filter gives rise to overshoot, ringing and related audio distortions, which, to some, are unacceptable. For this and other reasons, studio operations are carried out at a 48kHz sampling rate.

Using both sampling frequencies in a studio environment requires sample-rate converters. In a digital television studio, the audio sampling frequencies also must be coherent (derived from the same master clock) with the video sampling frequencies.

The quantizing process

The samples are further processed by assigning them a binary number approximating their sampled value. This process is called quantizing. Quantizing divides up the sampled voltage range into 2n - 1 quantizing intervals, where n is the number of bits per sample (sampling resolution). For example, an 8-bit system can identify 28 = 256 discrete sampled signal values (255 quantizing intervals). This is the case of a signal with an amplitude occupying the whole quantizing range.

Low-amplitude audio signals would be quantized with considerably fewer discrete levels, resulting in significant quantizing errors. These quantizing errors are correlated with the signal and are perceived as distortion. With higher-level signals, the quantizing errors are uncorrelated with the signal and are perceived as random noise. The quantizing errors can be reduced by increasing the number of bits per sample as well as the sampling frequency (oversampling). Early digital audio equipment (e.g., CD technology) used 16 bits (65,535 quantizing intervals). Current high-quality studio equipment uses 20 bits (1,048,575 quantizing intervals) or 24 bits (16,777,215 quantizing intervals) per sample.

Overall performance

The formula expressing the SNR of a digital audio system is:

SNR (dB) = 6.02n + 1.76 + 10log10 (FS/2FMAX)

where n is the number of bits per sample, FS the sampling frequency in Hz, and FMAX the maximum (low-pass filtered) baseband frequency in Hz. It is evident that higher values of n and FS ensure a better SNR. For example, if n = 24, FS = 48kHz and FMAX = 20kHz, the SNR is 151.24dB. A 6dB SNR improvement is obtained for every additional bit at a given FS. Oversampling improves the A/D and D/A performance by reducing the quantizing errors and aliasing component amplitudes. At a given n, four-times oversampling increases the SNR by 6dB. By comparison, a typical analog audio console would not exceed an SNR of 60dB. An analog audio tape recorder would have difficulties even reaching an SNR of 60dB.

Carrying digital audio signals

Figure 3 shows a simplified block diagram of a stereophonic digital audio system consisting of an ADC, a DAC and a transport medium. In this basic diagram, the digital audio is in its bit-parallel native format. Assuming a 24-bit accuracy, each of the two signals would be transported by 24 pairs of wires (one pair per bit), plus an additional pair for the clock signal. This calls for a heavy cable and connector. Early equipment worked in this manner, which is suitable for simple operational environments but not for a large installation.

For large installations, the digital signals are distributed using the AES/EBU bit-serial digital audio signal distribution format. This is a self-clocking single-cable format, which is now universally used. The resulting bit rate for a dual-channel (left and right) AES/EBU signal is 6.144Mb/s. While distributing this signal in a studio environment is feasible, distribution and transmission in a restricted bandwidth requires compression. Digital audio signals can be efficiently compressed by using MPEG methods. The MPEG-2 compression system exploits certain human auditory system (HAS) characteristics to remove redundant data and considerably reduce the bit rate. This is an added advantage of digital audio.

Michael Robin, a fellow of the SMPTE and former engineer with the Canadian Broadcasting's engineering headquarters, is an independent broadcast consultant located in Montreal. He is co-author of “Digital Television Fundamentals,” published by McGraw-Hill and translated into Chinese and Japanese.

Send questions and comments to:michael.robin@penton.com