How is the most important audio parameter for ATSC transmission so often maligned and misused? A little byte of audio metadata in the DTV AC-3 stream — dialnorm — was made into a standard with good intentions but from the beginning, it has been in a state of disrepair.
The analog curse
Analog TV audio levels are maximized by the marketplace and capped to avoid exceeding FCC modulation limits. This results in a limited dynamic range — a squashed sound. There's no opportunity for a dramatic moment. Movies sound lame. Symphonies sound anemic.
With the upper limit determined by a peak-reading meter, the loudness for consumers is inconsistent. Complex waveforms are made softer than simpler ones in order to avoid an FCC fine, to the disadvantage of the listener.
To make matters worse, high frequencies are compressed even more, causing the audio to sound dull. This is necessary to avoid overmodulation from the pre-emphasis that was included in the FCC transmitter rules back when audio didn't have much high-frequency content. There was a time when the resulting reduction in noise from a matching receiver de-emphasis seemed like a good idea.
Right off the bat, digital audio is better than FM audio, because there is no need for pre-emphasis. This eliminates dull-sounding audio!
But how do we manage levels in the digital age? A bad solution would have been to let the marketplace decide — as was done with the compact disc and MP3 files. The upper limit would be simple; the highest digital number is the highest peak value, where anything higher is clipped. The lower limit is 96dB down, leaving plenty of dynamic range available for the producer to keep average levels low and avoid clipping. But this approach results in a lose-lose loudness war just like with analog broadcasting and increasingly with digital audio files. Everybody tries to be the loudest. Everyone loses dynamic range. For digital television, there must be a better way.
The good idea
In developing the AC-3 compression system for movies, Dolby's engineers rightfully wanted to give the home listener the same benefit enjoyed in the theater — a consistent dialog level and a wide dynamic range.
The consistent dialog level is achieved by the use of a long-term averaging meter that is A-weighted to favor the frequencies in which our ears are most sensitive at low levels. Movie audio levels are adjusted so the average weighted dialog level remains consistent, pleasing both the listeners (who can better hear the dialog) and the theater owners (who get fewer complaints about trailers being too loud).
Understandably, a movie is often mixed so that the level of explosions and music crescendos exceed the average dialog level by a significant amount. This helps make a movie exciting!
Dolby could have picked a fixed average dialog level for AC-3. The specified dialog level could have been chosen a safe number of decibels below 0dBFS, leaving room for dramatic peaks. This level could have been adopted by the FCC, along with the AC3, within the ATSC standard.
Broadcasters could have adjusted their dialog levels using the appropriate metering to the level specified. New meters meeting the standard would become available. Hopefully, legacy content would have audio levels close to the chosen value. If not, processors could keep levels within bounds.
The idea of giving the consumer consistent dialog level and a wider dynamic range would have been achieved. Life for the broadcaster would have been simple. Life for the consumer would have been improved. But this is not what Dolby did.
The trouble begins
Nobody likes limits. Who chooses the limit? Should Dolby have designed AC-3 to suit the film industry or the broadcast industry?
Rather than specify a fixed amount of dynamic range, Dolby made it adjustable from 1dB to 31dB. This was accomplished by including a special data parameter that remotely controls the output gain of all final AC-3 decoders. (See Figure 1.) Every consumer decoder must apply this adjustment under terms of the Dolby license. This is the parameter dialnorm.
If the mix engineer wants the largest possible dynamic range, the dialog is mixed to a level of -31dBFS, and the dialnorm is logically set to -31. This results in unity gain at the decoder. If the mix engineer wants less dynamic range, a higher dialog level is chosen along with a dialnorm value of the same numeric value, resulting in a decoder gain reduction of the appropriate amount. This approach keeps dialog levels consistent from movie to movie, from show to show and from channel to channel. If dialnorm is set properly, the average dialog level from the decoder will be -31dBFS when measured with the averaging Aweighted meter.
There is no one right dialnorm value. It depends. Just because Dolby ships encoders set to -27 doesn't mean this is the correct value for your station. The correct value is the average dialog level on the input of your AC-3 encoder. This is where the good idea starts to go bad.
Nobody likes complexity or confusion. Here's what happened after the ATSC adopted AC-3:
Encoders came with an obscure knob that could be set between -1 and -31, where increasing the value makes the audio in every home illogically softer!
The shipped encoders had the knob set to -27 without saying why, leading many broadcasters to think this is the correct value. This left other broadcasters with the impression that they need to set dialnorm individually for every show in their library — a literally impossible task.
Broadcasters were told they needed to build systems to carry the metadata through their SD plant and storage system when no equipment existed to make it practical to do so.
This technology was introduced in an environment with no enforced standards for the analog portion of consumer equipment. A proper dialnorm setting can result in digital dialog levels that are below analog dialog levels.
All this complexity and confusion has caused listeners in many markets to report DTV audio levels much less consistent than the analog counterpart channels. Stations serving some of the major networks routinely transmit dialnorm values far from the actual dialog level, resulting in their programming appearing significantly louder than other networks. Stations offering multicast channels sometimes deliver widely varying audio levels even on their own channels. “With DTV, consumers have better pictures but more annoying audio. Varying audio level is the number one complaint. The public deserves better,” according to Jim Kutzner, chief engineer at PBS. It is a sorry state of affairs. Wasn't digital was supposed to make things better?
If the broadcast industry is going to achieve a consistent dialog level and wide dynamic range, it has work to do.
“Dialnorm doesn't have to be complicated,” says Jim Starzynski, principal audio architect with NBC Universal and chair of the ATSC S6-3 audio working group on DTV loudness.
Currently, the NBC Universal DTV network performs ongoing tests of legacy and commercial content mixed using traditional meters. The LEQ(A) average dialog level ranges from -20dBFS to -24dBFS. By simply setting station AC-3 encoder dialnorm remotely from the network to the average value of 22, reasonably consistent dialog levels in the home are achieved, matching other dialnormed sources. This averaging method is effective when specific content-matched metadata is not available and can work for local station content as well. It is not necessary to remix all the old shows. In the case of local commercials with extremely varied levels, a simple, properly adjusted audio compressor and limiter can be successful in reaching a targeted dialog level. PBS has adopted a similar approach, using a value of -24.
The 2007 edition of the PBS Technical Operating Specifications requires all new content to be delivered with an Leq(A) dialog level of -24dBFS. “This specification is key to our effort to improve audio for the public,” Kutzner said. (The specifications can be found at www.pbs.org/producers/redbook.) NBC Universal uses a similar value of -23dBFS. Currently, there is no need for a complex metadata system for every production, only a new approach to metering the final mix and layback.
Knowledge of this fixed dialnorm approach allows stations to consider simple steps to fix the dialnorm problem. (For more, read “Four simple steps to successful dialnorm.”) As awareness and technology progresses, an agile metadata system that tags matched dialnorm to content can be a goal, fulfilling the original intent of the DTV audio designers.
What about cable?
As the cable industry continues to reduce the number of analog channels and increase the digital ones, properly set dialnorm becomes even more important. In the old analog days, many cable systems ran all audio channels through their own compression equipment to reduce consumer complaints, making actions of a station irrelevant.
As cable converts to digital and passes supplier-provided dialnorm to analog and digital services, it is vital that the value in our broadcast stream is correct. Cable providers prefer to pass the signals without processing them.
The ATSC is leading an industry-wide effort to improve the situation. The organization's subgroup is considering a recommended practice to help the industry properly implement dialnorm. According to Starzynski, the subgroup is evaluating adoption of the new ITU LKFS loudness measuring standard, which is similar to the current ATSC Leq(A) specification but better suited to the measurement of music and other content without dialog.
By adopting ATSC A-53, the FCC applied the force of law to the proper value of dialnorm: “The value of the dialnorm parameter in the AC-3 elementary bit stream shall indicate the level of average spoken dialogue within the encoded audio program.” If the industry doesn't obey the law, count on consumer complaints and congressional hearings.
As consumers increasingly rely on AC-3 audio delivery through digital broadcasting, cable and DBS, it is imperative that the production, network, station and cable industries work together to make dialnorm effective.
Bruce Jacobs is chief technologist with Twin Cities Public Television and a long-standing member of the PBS Enterprise Technology Advisory Committee.
Four simple steps to successful dialnorm
Measure the long-term average A-weighted dialog level for different types of content and ensure that the dialnorm parameter is set to the measured value.
If the correct meter is not available, set the dialnorm value 14dB lower than the typical peak value and compare with other stations. If your channel seems too loud, move the dialnorm value toward the value of -1. If your channel seems too soft, move the dialnorm value toward the value of -31.
If the average dialog level going into the encoder varies by more than 4dB from show to show, install a gentle audio automatic gain control.
Encourage your peers to follow these same rules.