Electronically processed analog audio has been around since about 1876 when Alexander Graham Bell and Elisha Gray filed patents for “transmitting the human voice through a telegraphic circuit.” But while the evolution of digital technology some 100 years later added incredible sophistication to audio systems, the underlying signal processing principles remain fundamentally the same. This is due in part to the fact that the entire process is ultimately coupled to the human hearing mechanism — and that evolves much more slowly.
Manual level control at the beginning
In the early days of radio and TV broadcasting, all audio level control was done by a skilled operator at the transmission point. Because of FCC rules regulating the modulation of analog RF carriers, stations had to be vigilant to assure operation within legal limits, i.e., no overmodulation.
Soon we developed electronic systems that could do this automatically, resulting in automatic gain controls and peak limiters. But while these circuits maintained legal operating conditions, broadcasters (and advertisers) soon realized that there was an advantage to producing a certain sound quality, resulting in technology to automatically modify the sound. Among the tricks employed was the use of gain compression to make the program sound louder without exceeding regulatory limits.
Volume compressors work by dynamically altering a gain element in the audio path. By raising the volume at low levels and lowering it at high levels, the dynamic range of the material is reduced, and the overall volume of the material can be increased.
Compressors can work in a myriad of ways, with the input-output relationship of the compressor having a linear or complex relationship. One of the challenges of automatic gain control is to avoid noticeable artifacts, especially when the processor must react to sudden changes in input dynamics.
Consider a situation where an aggressive amount of compression is taking place. If the input audio is riding at a low level, the compressor will work to increase the gain. Now, if a high-level passage suddenly appears, the compressor must quickly lower the gain so that clipping or overmodulation does not occur. These gain transitions must occur with an associated time constant so that the gain change is not noticeable. Too slow a time constant can result in signal overload; too fast will result in pumping or breathing, as the gain change is heard.
Loudness is not an exact science
The human hearing response is anything but flat, even for the perfect listener, having a roughly bell-shaped response that peaks around 3300Hz. In addition, the shape of the hearing response changes with the overall intensity of the sound.
This gives rise to the notion of equal-loudness contours, which describe the sound pressures necessary at different frequencies to give an equal perceived sound level. Equal-loudness contours have been described by various researchers. (See Figure 1)
Because the loudness contours describe perceived sound levels, they have been developed empirically, by gathering statistics from large numbers of experimental listeners. The curves relate to each other by means of units called phons, which define the sound pressure level at 1kHz. The well-known A-weighting curve used to measure noise levels is an approximation of the inverse of the equal-loudness contour at the 40-phon level.
The perception of loudness is also a function of other factors. Loudness increases as the sound power is spread over critical bands, generally at a threshold of about one-third octave. Also, loudness is a function of the duration of the sound, with the human auditory system integrating the received power over a window of about 200ms to 1000ms.
In the 1960s, the FCC conducted a study of audience complaints regarding loud commercials on broadcast television. As a result, CBS Laboratories developed a series of loudness meters based on many of the loudness factors described above. (See Figure 2.)
By placing a loudness measuring circuit within the servo loop of an automatic gain controller, as in Figure 3, an automatic loudness controller was also developed. This kind of AGC-based analog loudness controller is not practical, however, because it will continuously force the program loudness to a constant level, resulting in an objectionable compression of the original program dynamics.
A better solution is available today with digital transmission: dialog normalization, or dialnorm. Dialnorm works by setting the ATSC receiver dialog level to match a predetermined loudness level for each source program. By measuring the long-term average program dialog level and then transmitting this as the dialnorm parameter in the bit stream, loudness consistency can be maintained from program to program and even across different broadcasters.
In the ATSC AC-3 bit stream, dialnorm is a 5-bit word, transmitted every 32ms, that sets a reference level in 1dB increments from 0dB to -30dB. At the receiver, dialnorm is then used to adjust the output level of the audio decoder. (See Figure 4 on the next page.) By equalizing to a common -31dBFS (full scale) level, the perceived loudness can be made uniform across programs and broadcasters. ATSC specifies the loudness measurement parameter as Leq(A), which stands for A-weighted equivalent loudness. It takes into account frequency dependence and loudness integration, both of which are needed to measure speech material.
The CBS algorithm mentioned earlier also takes into account loudness addition from separate critical bands, which helps facilitate the measurement of wideband nonspeech material. The ITU-R standard, BS.1770, titled “Algorithms to measure audio programme loudness and true-peak audio level,” specifies a more recent loudness measurement, Leq(RLB), using revised low frequency B-weighting.
Although Leq(RLB) is simpler to realize than the other methods, it performs better in subjective testing. It is now being proposed to replace Leq(A) as the recommended loudness metric in the ATSC standard.
One of the challenges of using dialnorm correctly is for different broadcasters to use equivalent practices in setting up their systems. Broadcasters can use different settings for dialnorm, as long as each one is consistent in their setups; any dialnorm setting can be used, as long as it is correctly generated. For example, a -23dBFS setting will be attenuated in the receiver by 8dB, -27dBFS by 4dB, and so forth.
One way to provide consistency is to produce all internal content to a common standard and then transmit a fixed dialnorm level. For example, at NBC Universal (NBCU), network stereo and 5.1 deliveries are required to be mixed at -23dBFS by the provider. Most commercials deliver at a -22dBFS average, making a nice transition with programs when they meet the spec, according to Jim Starzynski, principal engineer and audio architect for NBCU. The network's dialnorm parameter is set to -23dBFS to match the programs and commercials and comply with the ATSC standard by targeting to -23dBFS without the need for agile metadata, but yielding exactly the same results. All contribution material is therefore produced to the -23dBFS level, and the network metadata is sent to the stations at this same fixed level. Of course, adjusting dialnorm for each program is equally appropriate, provided it's done correctly.
Internet chatter is rife with anecdotes about hot or weak audio across different broadcasters, despite the use of dialog normalization. This is not a direct consequence of different broadcasters using different values of dialnorm, because the system allows this.
Unfortunately, some broadcasters are still playing a loudness war by intentionally setting dialnorm to -31dBFS and then producing audio at a higher level. This of course distorts the intent of dialog normalization by reproducing a higher loudness level. A similar problem exists when broadcasters leave their Dolby encoders set to the factory -27dBFS level and then produce audio without regard to the actual loudness levels.
Establishing common procedures
The ATSC S6-3 “Ad Hoc Group on Loudness Issues” has been looking at this problem for more than a year. The group is in the process of establishing a recommended practices document, which will include guidance on contemporary measuring, monitoring, metadata and dynamics control techniques, as they relate to the loudness portion of DTV. Expect this work to be available in 2009.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: firstname.lastname@example.org