Transition to Digital: Elements of psychoacoustics

Sound is defined as an oscillation in pressure, stress, particle displacement or particle velocity in an elastic medium. Sound also is the sensation these oscillations produce in the ear. Sound may be desirable (music or speech) or undesirable (noise). The oscillations that create sound can occur at any frequency. The widest range of frequencies audible to humans extends approximately from 20 Hz to 20 kHz, but few of us can hear frequencies at either extreme.

This article examines some basic characteristics of the human auditory system. Future articles will deal with amplification, distribution and monitoring of audio signals (April) and performance measurements of audio equipment and systems (May).

Sound-pressure level

Sound pressure can be measured in four types of absolute units: dynes per square centimeter (d/cm2), microbars, Newtons per square meter (N/m2) or Pascals (Pa). (The variety of measurement units surely does not make life easy for audio engineers.) These units are related to each other by the following equation:

1 d/cm2 = 1 microbar = 0.1 N/m2 = 0.1 Pa

For an average person below age 30, the lowest level of sound pressure at 1 kHz that he can hear is 0.0002 d/cm2. This hearing threshold is the reference for measuring sound-pressure level (SPL), a relative measurement expressed in decibels (dB). The SPL of a sound above the reference sound pressure is given by the following formula:

SPL (dB) = 20 log10 (P/PREF)

where SPL (dB) is the number of decibels, P is the measured sound pressure in d/cm2, and PREF is 0.0002 d/cm2

Figure 1 on page 22 shows some SPLs that the human ear encounters in various environments, expressed in d/cm2 as well as in dB relative to the threshold of hearing.

In a broadcast environment, we can identify three typical SPLs:

120 dB SPL: The typical peak SPL of a symphonic orchestra or a rock concert.
74 dB SPL: The average SPL of typical voice programs. It is used as a reference level by microphone manufacturers.
30 dB SPL: The typical SPL of ambient noise.

Loudness and loudness level

The loudness of a sound is not only relative, but also it is subjective and depends on the sound pressure and the frequency of the sound. It is an auditory sensation that describes sounds on an ascending scale from soft to loud. The unit of loudness is the sone. The calculated loudness of a steady sound, in sones, is related to the loudness level, in phons, by the equation:

ns = 2(L - 40)/10

where ns is the loudness in sones and L is the loudness level in phons.

The loudness level of a sound expressed in phons, is numerically equal to the median SPL, expressed in decibels relative to 0.0002 d/cm2, of a 1 kHz reference tone. The calculated loudness level of a sound, in phons, is related to the loudness, in sones, by the equation:

L = 40 + 10 log2 ns

where L is the loudness level in phons and ns is the loudness in sones.

At frequencies other than 1 kHz, the ear requires different SPLs for the human auditory system to perceive the same loudness. Figure 2 shows the normal equal-loudness contours for pure tones as per ISO standard 226. The SPL is expressed in dB with reference to 20 μPa. These curves can be viewed as inverted frequency-response curves at various SPLs for the human ear. A 1 kHz tone having an SPL of 40 dB has a loudness level of 40 phons. To give the same sensation of loudness at 63 Hz, the SPL must be increased by about 20 dB. This shows that the sensitivity of the ear is considerably lower at frequencies below 1 kHz. Equal-loudness contours have different shapes at other SPLs. As the sound level increases, the ear's frequency response becomes flatter. At an SPL of 110 dB, it is reasonably flat — within ±10 dB.

Human auditory system

The dynamic range of the ear is bounded at the top by the threshold of pain and at the bottom by the threshold of hearing. The threshold of pain is typically 120 dB, but it varies from individual to individual. Sounds having SPLs of about 120 dB and above can cause pain as well as immediate and permanent loss of hearing. Regular exposure to sounds of about 90 dB SPL will eventually cause hearing loss. Sounds having SPLs below 0 dB are inaudible.

Above age 30, hearing normally deteriorates. The hearing threshold rises, and perception of the higher frequencies diminishes. But, for persons of any age, the threshold of hearing depends on the level of ambient noise, which has a masking effect. Noise masking is defined as the process by which the threshold of audibility of a wanted sound is raised by the presence of an unwanted sound — in this case, noise.

The human auditory system resolves sound in much the same way as an array of overlapping bandpass filters, resulting in so-called critical bands, shown in Table 1 on page 20. The “comb filter” effect of these critical bands accounts for the masking phenomenon. In the presence of a dominant sound at a given frequency, other lower-level sounds whose frequencies are inside the same critical band may become inaudible. This psychoacoustical characteristic is called frequency domain masking.

Figure 3 on page 24 shows the effect that a steady-frequency, steady-amplitude, 5 kHz tone has on a person's hearing threshold at adjacent frequencies. It creates a raised masking threshold that causes adjacent frequencies, whose levels are below the raised threshold, to become inaudible. Digital audio compression techniques take advantage of this characteristic by assigning fewer bits to audio signals in the raised masking region. The resulting increased quantizing noise is below the raised masking level and cannot be heard, so the bit-rate reduction has no adverse effects. Multiple simultaneous signals of different frequencies, such as music, result in an overall rising of the threshold of hearing at all frequencies. This permits digital audio compression systems to dynamically reduce the bit rate without affecting the reproduced sound quality. Compression systems based on this property are called perceptual encoders.

In addition to the frequency masking effect, the human auditory system exhibits a temporal masking effect. Experiments have shown that temporal masking starts before a masking signal is applied to the human auditory system (pre-masking) and slowly decays after the tone is stopped (post-masking). Figure 4 shows an example of the temporal masking effect and reveals that the post-masking lasts longer than the pre-masking.

Michael Robin, former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill.