Digitizing audio

The world around us is analog. Our perception mechanism is also analog. Standard audio transducers such as the microphone and the loudspeaker are analog devices. Sound perception of humans occupies about 20 octaves extending from 20Hz to 20kHz.

Why digital audio?

In an analog system, the infinite number of discrete electrical signal amplitudes that the microphone produces (the information) are amplified to a suitable level for further processing such as mixing, recording, transmission and reproduction. The signal processing is essentially a transmission medium that carries the original signal from the source (the sound captured by the microphone) to the destination (the listener). The medium inherently introduces undesirable electrical signal impairments (linear distortions, nonlinear distortions and noise), which have a direct effect on the reproduced audio quality. These impairments are additive, and the overall performance of a complete analog chain depends on the individual performance and number of discrete components assembled in a typical operational configuration (the medium). This puts a limit to the number of stages that an analog audio signal can pass through before it becomes too impaired to be acceptable.


Figure 1. The sampling process consists of multiplying the analog audio signal with a stream of repetitive pulses. This is equivalent to a pulse amplitude modulation (PAM) process, which is represented here in the time domain. Click here to see an enlarged diagram.

Many analog signal-handling difficulties can be eliminated if the analog signal is digitized prior to modulation and transmission. In a digital audio system, the original analog information is converted to a digital representation. This information is in binary form. Essentially, the signal has two well-defined states: zero and one. Undesirable medium-generated impairments affect the digital electrical signal in a manner similar to the one affecting the analog signal. They have, however, no effect on the information as long as the receiver can distinctly recognize the two levels. The result is that the message distortion is restricted only to the analog-to-digital (A/D) and digital-to-analog (D/A) tandem process, thereby improving the transparency. The transparency is maintained as long as the SNR is within some medium-related values beyond which the “cliff effect” occurs and the transmission shuts off.

Sampling considerations

Sampling is the first step towards digitizing audio signals. It consists of measuring the analog audio waveform amplitude at periodic intervals, T. The main concern is to represent the original analog values with adequate precision. The measurement accuracy depends on the sampling frequency. The sampling frequency has to be at least twice the maximum audio frequency, preferably higher.

The sampling process consists of multiplying the analog audio signal with a stream of repetitive pulses. This is equivalent to a pulse amplitude modulation (PAM) process. Figure 1 on represents this process in the time domain, and Figure 2 represents it in the frequency domain. In this idealized case, the sampling frequency (Fs=1/T) is considerably higher than the sampled frequency, and the sampling pulse duration is close to zero.


Figure 2. The pulse amplitude modulation (PAM) process is represented here in the frequency domain. In this idealized case, the sampling frequency (Fs=1/T) is considerably higher than the sampled frequency, and the sampling pulse duration is close to zero. Click here to see an enlarged diagram.

For historical reasons, CD recordings use a sampling frequency of 44.1kHz. Sampling 20kHz bandwidth analog audio signals at 44.1kHz requires a 20kHz low-pass filter between the analog input and the A/D converter. Well-designed filters avoid interference between the baseband audio and the sampled PAM spectrum, which would result in aliasing. Even so, many purists claim that a 20kHz low-pass filter gives rise to overshoot, ringing and related audio distortions, which to some are unacceptable. For this and other reasons, studio operations are carried out at a 48kHz sampling rate. Using both sampling frequencies in a studio environment requires sample-rate converters. In addition, in a digital television studio, the audio sampling frequencies have to be coherent (derived from the same master clock) with the video sampling frequencies.

Quantizing considerations

The samples are further processed by assigning them a binary number approximating their sampled value. This process is called quantizing. Quantizing divides up the sampled voltage range into 2n-1 quantizing intervals, where n is the number of bits per sample (sampling resolution). For example, an 8-bit system can identify 28 = 256 discrete sampled signal values (255 quantizing intervals). This is the case of a signal with an amplitude occupying the whole quantizing range.

Low-amplitude audio signals would be quantized with considerably fewer discrete levels, resulting in significant quantizing errors. These quantizing errors are correlated with the signal and perceived as distortion. With higher level signals, the quantizing errors are uncorrelated with the signal and perceived as random noise. The quantizing errors can be reduced by increasing the number of bits per sample as well as the sampling frequency (oversampling).


Figure 3. This simplified block diagram illustrates a stereophonic digital audio system consisting of an ADC, a DAC and a transport medium. In this diagram, the digital audio is in its bit-parallel native format. Click here to see an enlarged diagram.

Early digital audio equipment (e.g. CD technology) uses 16 bits (65,535 quantizing intervals). Current high-quality studio equipment uses 20 bits (1,048,575 quantizing intervals) or 24 bits (16,777,215 quantizing intervals) per sample.

Overall performance considerations

The formula expressing the SNR of a digital audio system is:

SNR (dB) = 6.02n + 1.76 + 10 log10 (Fs/2Fmax)

where n is the number of bits per sample, Fs the sampling frequency in Hz and Fmax the maximum (low-pass filtered) baseband frequency in Hz. It is evident that higher values of n and Fs ensure a better SNR. At n=24, Fs=48kHz and Fmax=20kHz, the SNR is 151.24dB. A 6dB SNR improvement is obtained for every additional bit at a given Fs. Over-sampling improves the A/D and D/A performance by reducing the quantizing errors and aliasing component amplitudes. At a given n, 4x oversampling increases the SNR by 6dB.

Transporting digital audio signals

Figure 3 shows a simplified block diagram of a stereophonic digital audio system consisting of an ADC, a DAC and a transport medium. In this basic diagram, the digital audio is in its bit-parallel native format. Assuming a 24-bit accuracy, each of the two signals would be transported by 24 pairs of wires (one pair per bit) plus an additional pair for the clock signal. This calls for a heavy cable and connector. Early equipment worked in this manner.

This works well in simple operational environments but not in a large installation. For large installations, the digital signals are distributed using the AES/EBU bit-serial digital audio signal distribution format. This is a self-clocking single-cable format, now universally used. We will describe this format in a future article.

Michael Robin, a fellow of the SMPTE and former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill, and recently translated into Chinese and Japanese.

Send questions and comments to:michael_robin@primediabusiness.com

Home | Back to the top | Write us