Audio quality vs. loudness compliance

So, here we are, just on the other side of the start of enforcement of FCC rules concerning the loudness of commercials in digital television. Today, loudness issues are much better understood, and significant progress has been made with solving disparities. Things are noticeably better than before, except for one thing: Audio quality largely pays the price for compliance.

Compliance is understood to mean a lack of regular complaints to the FCC by viewers. One technique is Automatic Gain Control (AGC), which raises or lowers gain on a sample-over-sample basis. This keeps audio loudness centered on a given static target. It is a relatively easy answer for compliance and is fully supported by both A/85 and the FCC Report and Order. However, AGC achieves its goal by more or less treating every shift similarly, correcting the bad — and the good. Everything gets a little something, whether it needs it or not. Sounds like a recipe for mediocrity.

Although there are sophisticated (and some unsophisticated) ways to accomplish AGC, no machine — regardless of manufacturer, topology or promise of magic outcome — can in real time know the difference between a good, intentional loudness shift and a bad, annoying loudness shift. Certainly, human-generated commands or automation can be used to change or bypass processing for content that is believed to be good, but thus far, this involves a great deal of effort and is uncommon.

Mix engineers have long resigned themselves to the fact that what was transmitted over analog television would be different from what they created. That was just the way it was. In today’s digital world, there is not any technical reason why before and after cannot match. In general, this successfully occurs when films mixed for the big screen are transferred to DVD. Helpfully, the same audio coding system, Dolby Digital, is used for digital television.

Beyond average loudness

Looking more broadly at the program stream, loudness problems have to be considered both an issue of matching the average loudness of commercials and programs and one of inner-program consistency, also known as artistic dynamic range. Matching average loudness of different pieces of content does not solve the problem of jarring transitions. These occur at program boundaries and can be the result of a mismatch of dynamic range and/or the short-term loudness at the junction between the end of one piece of content and the beginning of the next.

Think of a program whose average loudness measures at one level and a commercial that measures at another level. If their averages are matched by scaling to a target, on average they will sound equally loud. However, if a dramatic program is ending with a quiet death scene (as they often do) and is quiet compared to the average of the 60 seconds leading up to the commercial, guess what is about to happen? Yep, the commercial will seem too loud. Guess what else? Meters will be totally happy, because they are looking at long-term averages. Viewers will not be so happy.

This is subtly different from intentional dynamic range variation, such as a loud train crash or gunfight in an action adventure movie, as it is an unexpected variation caused by automating disparate elements together, and it just sounds wrong. In fact, the difference in loudness in this case may be much less than the gunfight, but it is perceived as much worse.

Permanent AGC techniques can probably keep the passive couch potato happy, but the programming will be irreparably changed, and the producers probably will not be thrilled with the results.

The best producers know they must make programs dynamic and engaging for those viewing with surround systems, without going too far and alienating the core audience listening in stereo (estimated to be in excess of 70 percent of viewers), and all while remembering that there might be a dynamic commercial randomly inserted.

The “M” word

It turns out that transmission audio coding systems such as Dolby Digital (AC-3) and Dolby Digital Plus (E-AC-3) considered this issue long before it arrived into the consciousness of the industry. The thought was that by including additional data about the audio data, or metadata, the codecs could know more about the audio they carry and provide some features to help consumers better match the audio to their individual environment.

One such feature is a standardized loudness target referenced to what is arguably the most common feature in television programming: dialogue. Called dialnorm, it is intended to represent the average loudness of dialogue within a given program. The idea is that with dialogue as the “center point,” variations above and below this average should be the dynamics of the audio — sometimes softer, such as background sounds or music, and sometimes louder, such as crashes.

For programs without dialogue, any anchor element can be chosen, and the dialnorm parameter provides simple scaling of the audio to match surrounding content. This process can also work the other way around, where dialnorm is set at a specific target, and the average loudness of all content is pre-scaled to match it.

Of course, the dynamics of programs may exceed a comfortable range, and so metadata also provides a rather sophisticated “distributed” dynamic range control (DRC) system. Audio is analyzed during encoding, and based on the reference loudness target (set by dialnorm), dynamic range control metadata values are generated. Instead of being applied to the audio, these are passed along as part of the metadata. At the decode side, the application of dynamic range control metadata is enabled by default, and programs can be delivered with a range more appropriate for a typical viewing environment.

However, and this is critically important, because the audio is untouched, dynamic range control metadata can also be ignored, and the original program will be delivered with no modifications for those who have the desire and the environment to support it.

But, there is currently a downside. By the very nature of how the content is preserved — separation of the audio and control data — things can easily go wrong with no warning.

Encoders are blissfully unaware of incorrect dialnorm values, but this can cause incorrect generation of the metadata dynamic range control words. Because these are applied by default in decoders, the audio will be impacted. Sometimes there will not be enough DRC, and sometimes it will be far too much. This makes upstream measurement and scaling critically important.

Dynamic range issues can step over regulatory targets, satisfy meters and still cause viewer complaints. The alternative of irreversible dynamic range control processing will lead to content that, although consistent, may be overly so. Like gravy without the occasional lump, the excitement of variance will be gone.

Can this be solved? Like using a hammer to install a screw, permanent dynamic range control gets the job done. However, there are better answers that are less destructive, less burdensome, and will preserve content (and your fingers). Stay tuned.

—Tim Carroll is president and founder of Linear Acoustic.