Using AutoNorm for dialogue normalization

Figure 1. Shown here is AutoNorm as it might be implemented at a typical DTV station. Note that this shows such options as local audio being upmixed, and splicing between network and local AC-3 (Dolby Digital) streams. Click here to see an enlarged diagram.

Reviewing the standard

Loudness issues have plagued television broadcast for decades, and digital television is turning out to be no different. Luckily, the audio format specified by the ATSC is AC-3, better known as Dolby Digital. And it has some built-in features that, when used properly, can help to alleviate the problems.

Unfortunately, these features often sit dormant and unused. The results are clearly audible in most television markets, and many broadcasters note a dramatic increase in complaints. Now is the time to get this problem back on track, and we present an innovative way to do just that.

Basically, the Dolby Digital signal is divided into two sets of data: compressed audio data and informational/control data called metadata. The metadata describes the audio signals it rides with, reporting such things as the number of audio channels present (audio coding mode), how to re-mix the audio if more channels are present than speakers to reproduce them (downmix coefficients), user selectable dynamic range control and the loudness of the program (dialogue level or dialnorm).

When developing the Dolby Digital system, Dolby Laboratories determined that dialogue is the anchor of most programs and is what most viewers use to judge the loudness of that program. Yes, cannon shots or car crashes may be loud, but they are not sustained events.

Dialogue, on the other hand, occurs throughout most of a typical program, and is, therefore, a logical choice for a loudness anchor. This is done by making a long-term A-weighted measurement of typical dialogue, that is sections without screaming or whispering, and referencing this value to full-scale digital.

How loud is it?

All decoders use the difference between the measured value and the internal reference level of Dolby Digital (-31dBFS) to directly control a 1dB per step attenuator. For example, if a typical motion picture is measured and found to have an average dialogue level of -31dBFS, then the difference from this measurement and the internal reference level is zero and the decoders will apply no attenuation.

Automatic normalization

If a typical sitcom has a dialogue level of -21dBFS, then the decoder will apply 10dB of attenuation. It should be obvious that this can be a useful technique for matching the loudness of one program with the next. And if done correctly, it would result in all the channels matching each other. Imagine that!

Measuring dialogue loudness has always been a challenging process. Initially it involved expensive meters, cross-conversion charts, and plenty of patience and time to get a single correct value. This quickly led to the parameter being ignored in many cases. Dolby Laboratories developed the LM100 broadcast loudness meter to make this an easy and almost automatic procedure. While this meter does its job well, it still requires operator intervention to apply the results to the system.

Why is this unique?

Taking this pioneering work to the next logical step, Linear Acoustic developed AutoNorm, a real-time method to automatically apply these measurements to audio signals and compressed bit streams. The dialogue level parameter generated by the meter is further processed, and then inserted into precompressed Dolby Digital streams, metadata streams, and eventually Dolby E and HD-SDI streams. The initial release is with the StreamStacker platform, which already allows for multiplexing and splicing Dolby Digital streams and frame synchronizes all of these streams to a local plant reference.

The LA-5124 AutoNorm demultiplexer and Bitstream Splicer are the DTV station side of the StreamStacker system and have a serial connection that interfaces directly with the serial port of the LM100. As the meter is measuring the compressed Dolby Digital stream, the dialogue levels are reported to the StreamStacker, which processes them and inserts them into the outgoing bit stream. Several parameters can be adjusted, which help to weight the incoming dialogue level values if desired and protect the bit stream in case of loss of the values due to connection or measurement failure. (See Figure 1.)

Catch it in transmission

Classic attempts to control program loudness have all relied upon controlling the peak-to-average ratio of programs, and therefore reducing their dynamic range. While this works well in many cases, it has two drawbacks. First, the dynamic range of the program is sacrificed to protect against the occasional offending loud commercial or program. And second, it is not reversible.

The Dolby Digital system includes a dynamic range system that defaults to applying dynamic range control but allows users who desire more or a full dynamic range to scale their decoders. Loudness matching is tied more closely to the dialogue level parameter, and this is not adjustable by the consumer. The dynamic range processing system in Dolby Digital relies on a properly set dialogue level. Otherwise, inappropriate amounts (i.e., too little or too much) will be generated and the system becomes ineffective.

In these somewhat early days of multichannel audio in DTV, transporting metadata is certainly becoming more commonplace, but by no means is it a risk-free process. Many things can cause metadata to go missing or be incorrect, with the result being inappropriate audio delivered to consumers. Since the final catch point remains the transmission area of the terrestrial, cable or satellite broadcaster, it seems to be the most logical place to check and, if necessary, regenerate proper dialogue level values.

Now that measurement of the dialogue level is a straightforward process, we bring the circle to a close by allowing the measured values to be inserted into pre-encoded Dolby Digital streams at the point of transmission. This will allow the long-term loudness of a television station to be automatically adjusted just before the signal is sent on to consumers.

While dynamic range control might still be necessary in certain situations, such as controlling the instantaneous loudness of locally inserted commercials, the original dynamic range of normal programming will be preserved and loudness kept even. Several television stations in major markets have these systems on the air, and the long-standing problems of loudness are on their way to a permanent resting place in history.

Tim Carroll is the president of Linear Acoustic and the former product manager for the Professional Audio Division at Dolby Laboratories.