Managing DTV LOUDNESS with dialnorm

The NTSC analog television system uses conventional compression and limiting at various stages of the signal chain to manage audio loudness for broadcasts. This practice compensates for limitations in the dynamic range of analog equipment (particularly the studio-to-transmitter link) to control the various loudness levels of audio received from suppliers. It also helps smooth out program-to-spot transitions. Though effective, this practice permanently reduces dynamics and changes the audio before it ever reaches the audience. It modifies the characteristics of the sound, altering it from what the program provider intended to fit within the limitations of the analog system.

Alternatively, the ATSC digital television system transmits metadata or data about the data to control loudness and other parameters more effectively. Dialog level, or dialnorm, is the metadata parameter that sets the loudness of all ATSC audio at the receiver. (See Figure 1 .) The content provider or DTV station sets dialnorm at the origin, and it agilely adjusts the decoder in the home. This and 28 other settings are integral to the ATSC Dolby Digital AC3 audio bit stream, which also includes optional, user-selectable dynamic range control. The ATSC document A/53D refers to dialnorm as the practice that should be used by all broadcasters for managing loudness for DTV.

Program or commercial audio is best experienced when mixed at appropriate levels with varied dynamics. This mix is based on the choices made by the sound designer or the mixing engineer and predicated by the production. At home, without some type of processing, varying volume and dynamics can be uncomfortable to listen to as the sound jumps up and down when switching between programs, commercials and promos.

Ironically, a broadcaster's goal is to both accurately transmit material, maintaining the intended sonic characteristics of the production, and also to provide an enjoyable viewer experience. Not restricted by the analog system's limitations, dialnorm makes it possible to do just this by providing unaltered but normalized sound. Dialnorm sets the audio to a comfortable level, much like how a viewer uses a remote control to adjust the volume between TV show and commercial transitions. This is done automatically, without having to reach for the volume control, and it does not affect dynamics or compromise the soundtrack.

How it works

Studies have shown that viewers adjust the volume of a television based on the level of the spoken word within a broadcast. A dialnorm setting is achieved by either measuring a program's dialog level or by mixing a program or spot's dialog amplitude to an already established dialnorm figure, which will be used by the DTV audio encoder.

With either method, the amplitude of the program dialog must match the dialnorm that's set in the audio encoder. This is required for loudness and for the optional dynamic range control to work properly. A Dolby LM100 broadcast loudness meter can take short- and long-term dialog measurements and display an easy-to-read, accurate number as the dialnorm figure. (See Figure 2.)

Audio and dialnorm are transmitted in the DTV bit stream simultaneously. The Dolby Digital system, which includes the decoder in the home, will level shift the volume by reducing the audio by the difference between the dialnorm figure and -31dBFS. (See Figure 3 on page 66.) So, if each program or spot is stamped with the proper dialnorm matching the level of the audio, dialog will automatically be level shifted to -31dBFS by the DTV system. Because this only engages once for each program or spot, and all audio accompanying the dialog is level shifted equally, dynamic range is not compromised. Transitions become smooth, no matter where in the -1dBFS to -31dBFS window the dialog was mixed. When all broadcasters do this properly, channel changing on the DTV dial is normalized.

A problem worth fixing

Unfortunately, due to a lack of awareness or understanding, not all DTV stations use dialnorm effectively. Many stations' dialnorm is frequently left at the encoder's -27dBFS default or is turned off, intentionally set to -31dBFS. Therefore, program audio recorded at a nominal -20dBFS stamped with -27dBFS dialnorm will be 7dB louder than a properly set ATSC bit stream. This same audio with dialnorm shut off (-31dBFS) will air 11dB louder!

Neglecting dialnorm is bothersome on a single channel and becomes a bigger nuisance for the audience when changing channels between properly set and incorrectly set stations in a market. Considering that the resulting level variance can easily exceed 10dB, it's easy to understand why properly adjusted dialnorm is critical for delivering the best listening experience for DTV viewers.

Setting dialnorm properly is an easy, straightforward process for a DTV station engineer. It can be divided into two categories: network content or station content.

Setting up dialnorm for network content

Networks can distribute program and commercial metadata with their audio and video, either discretely or as part of a Dolby E bit stream. In most cases, this is used to directly drive the required Dolby 569 audio encoder's external metadata port. Master control at the station engages the switch to the network signal, and the dialnorm metadata follows. The local station airs the network's content, which includes the program-matched dialnorm.

In situations where networks distribute an ATSC bit stream for stations to transmit directly, the dialnorm is already contained in the network's Dolby Digital stream that gets passed directly to the audience.

Setting up dialnorm for station content

For upconverts at two-channel-only stations and local HD origination audio, a station engineer needs to acquire a dialnorm setting based on the average level of all programs, spots and local news. This process may be simplified by the existing, reduced dynamics used by programmers, by the news mixing engineer or perhaps by SD distribution that is already in place.

Once samples are evaluated, a number is determined and tested. Then the average figure can be entered as the station's unique dialnorm.

This process is a compromise that's necessary when SD programming is used and a metadata stream matched to the content isn't available to control the DTV audio encoder agilely. An acceptable comfort zone of -5.4dB to +2.4dB is the range that must be exceeded before it's likely a viewer will try to adjust the volume. This finding indicates that there is some dialnorm latitude.

Two-channel-only stations with audio encoders that do not support external metadata should include the network audio feed as part of the analysis to determine the station's average dialnorm setting. For audio accompanying HD local origination that's capable of distributing matched metadata, this signal should drive the audio encoder's metadata port directly. It's also acceptable for a mix to be set to a predetermined loudness that matches a preset dialnorm figure for that source.

Conclusion

In a new DTV world, putting in the time and effort to understand the process and to implement an effective dialnorm plan simply makes sense. When done properly and universally, everyone benefits from the sonic experience this system is capable of providing. Supplying anything less is simply a disservice to a broadcaster's audience.

Jim Starzynski is principal engineer in advanced technology for NBC Universal.