ATSC Audio Seminar: Dialnorm Metadata

RANCHO MIRAGE, CALIF.: Dialogue normalization is the method for taming radical shifts in TV volume that come with some content jumps. While an ideal dialnorm value has been determined, it’s not always merely a matter of setting and forgetting. Twin-Cities Public TV’s Bruce Jacobs used Dolby E to illustrate the complexity of setting dialnorm, listing 28 metadata parameters associated with setting dialnorm. The default dialnorm value is not the correct value, he stressed.

“Literally, dialnorm does not have a minus sign in front of it,” said Jacobs, who delivered his presentation at the ATSC Audio Seminar Tuesday. The seminar was part of the HPA Technology Retreat near Palm Springs.

Attention must be paid to channel mode, since audio directed to the center channel of 5.1 surround sound can get lost in stereo, he said. Bitrate can be “set at your convenience,” he said, while pre-emphasis protection should not be turned on “under any circumstances.” Various types of filters--DC, low pass and LFE low pass--can be turned on, or left off if better filters are available.

Surround 3 dB attenuation is an option, though most applicable to film mixes, he said.

“Broadcasters typically don’t have film mixes, but a network might end up with one,” Jacobs said. “I presume PBS is never going to send me a film mix.”

Surround Phase Shift is another parameter that shouldn’t be turned off “unless you have a good reason,” Jacobs said. SPS works with Dolby Pro Logic, sending control words to apply a 90-degree phase shift in the encoder. Jacobs said one side effect is that the center channel of 5.1 could be lost in a downmix to stereo.

Tom Holman of the University of Southern California noted that pans don’t work properly without Surround Phase Shift, which dates back to “Apocalypse Now,” he said. Center is typically protected in a downmix to a Pro Logic decoder, but Jim DeFillippis of Fox said center could be lost in situations where the 90-degree shift is inadvertently applied twice.

Jacobs said there are three ways of managing dialnorm metadata, and that each is equally valid if managed properly. He described them as “fixed,” “preset” and “agile.”

Fixed is simply that--placing a fixed value on audio throughout the entire broadcast plant. It requires the use of a target value, such as the ATSC’s RP value of -24 LKFS. The benefits of using the fixed mode of management is that no additional metadata channel is necessary, all AC-3 encoders work with it and it carries low risk.

However, Jacobs noted, it can’t be customized, and broadcasters may have to get producer/distributor agreements on the applied value. The preset scheme allows some degree of flexibility in that it accommodates shifting from one set of fixed parameters to another, he said. It’s a simple way to manage channel changing, though a frame-synch is required downstream, he said.

Jacobs said agile dialnorm metadata management is considered by some to be the holy grail of schemes, “though I can’t imagine trying to go here,” he said. “Good luck to you!”

Trying to “design a metadata system that carries through the system and the plant that doesn’t fail anywhere,” is exceedingly difficult, and delay could be introduced, Jacobs said.

Another approach to agile management involves using the vertical ancillary space, or VANC. Use of VANC requires compatibility in all devices, however, and Jacobs noted that some services don’t pass VANC data.

“Tape formats have limited capacity, there’s competition from captioning and AFD,” he said. “I’m a systems guy and I’ve got to be practical. I’ve got to have a system that’s manageable. A lot of our systems are still analog, and when you try to get metadata through analog systems, it’s impossible.”

Jacobs noted that using a combination of management modes is possible. HBO, for example, mandates that producers provide agile metadata, but used fixed parameters for authoring. The application of dialnorm, in whatever fashion, is merely one element delivery of audio volume. Among the litany of encoder commands embedded in the metadata are those that determine dynamic range control. (“How Loud is Too Loud? Determining dynamic range control”)
-- Deborah D. McAdams

(Image by Andrew Back)