Managing loudness for TV

Arguably the hottest topic in television these days is loudness. The subject has been beaten to death since the serious introduction of legislation to specifically mandate what was already on the books as law, albeit confusingly.

Broadcasters are in a panic because as license holders, they are ultimately responsible for obeying the law and could face fines or worse if not compliant. Rubbing the other side of this issue raw is the desire to preserve program integrity. We could take the route used in the NTSC days and just blindly process the audio to prevent this modern digital version of “over-modulation.” This would stem complaints of loudness shifts, but at the expense of changing the content.

So, the industry finds itself between a rock and a loud place. The ATSC document “A/85: ATSC Recommended Practice: Techniques for Establishing and Maintaining Audio Loudness for Digital Television” (available at www.atsc.org under Standards) lays out in detail some suggested methods for approaching the problem.

Loudness

Loudness is a frequency-weighted measurement integrated over some time. To do this, the ITU standard BS.1770 describes a measurement method where audio is first filtered to remove the extreme low frequencies while tipping up higher frequencies (to emulate the physiology of our heads). Then samples are stored and averaged to produce a result. This is why the notation looks somewhat strange: -24LKFS means Loudness (integrated over time), K-weighted (filtered), with respect to Full Scale digital.

What exactly gets measured to determine loudness? This is one of the most debated topics within standards organizations around the world right now. The ATSC long ago specified that it should be the average level of spoken dialog, because speech is the most common feature in the majority of television programming. This is the so-called anchor element. The new A/85 recommended practice extends this to encompass any anchor element so that music programs or music-only commercials are not left out. There is wisdom to this idea of an anchor element that might not be readily apparent. Ignoring the anchor and just measuring the overall loudness of all channels all the time can produce results that do not match perception with some programming.

Metadata

The golden and simple rule is this: The loudness of transmitted audio and the transmitted loudness metadata indicator (a.k.a. dialnorm) must match. A summary of the four common techniques of metadata control is useful as no single method is universally applicable:

DynamicMeasure each piece of content — programs, commercials, interstitials, etc. — either during production or during ingest. Store the audio with a valid loudness reference indication (dialnorm metadata), and make sure the audio and the metadata make it to the consumer.While not for everyone, this method has proven useful for broadcasters with straightforward and reliable distribution paths, such as movie channels, and has worked well for nearly a decade. The content is not changed and can remain intact to the consumer.
StaticPick a loudness target, and set the facility's dialnorm value at this number. Then, measure the overall average loudness of each piece of content, and adjust or scale the content if necessary to have it match the target.The ATSC A/85 recommends a target level of -24LKFS ±1 or 2dB (over time, hopefully) for content that does not have or cannot have metadata, as it was found by terrestrial broadcasters that most of their legacy content fell within this range and had no metadata. It may not be completely appropriate for all broadcasters, especially movie channels where content is likely quieter on average, so some are using -27LKFS. Remember though, as long as the content loudness matches the metadata value, it does not really matter what that actual metadata value is. Scaling is an overall gain adjustment performed once to realign each piece of audio to the target. Compared with the original, the result may be louder or softer overall, but it has not been otherwise changed.
Traditional audio processingSimilar in many ways to the devices we used in analog television, a traditional processor employs wideband and or multi-band automatic gain controls to constantly adjust the audio signal, effectively reducing the peak-to-average ratio. The loudest sections are decreased, and the softest sections are increased toward some center target. This target can be aligned with a static dialnorm metadata value.Managing loudness automatically but not wanting to modify the content is like swimming without getting wet. The former necessitates the latter, and until time machines become more practical, it is what it is. While some techniques are better than others at preserving a consistent balance, all of these systems change the content compared with the original, although sometimes this is a good thing.
Hybrid metadata audio processingThis consists of a combination of metadata and traditional audio processing, where present metadata is used to guide processing to be applied only when and if necessary and to generate new metadata based on the measurement of incoming audio. The degree to which the original content is changed is adjustable, from purely traditional with fixed metadata to only protection limiting with dynamic metadata that can be bypassed by the savvy viewer or program producer.

Continue to next page

Loudness range

The approach that the ATSC has taken logically matches a single point of focus, the anchor element of a program, but this essentially ignores the loudest and softest portions of a program. It is left up to the program producer to decide what range the rest of the elements will have around the anchor.

Dolby published the comfortable and acceptable listening range based on listening tests with actual viewers. (See Figure 1.) The scale is asymmetrical. Sound can get softer over about a 10dB range and still be acceptable, while sound that gets louder can only go about half as far and still remain comfortable.

At first glance, it would seem that a 15dB range is not much, but remember that this is a weighted loudness measurement and is content-dependent. Crickets are probably fine 30dB quieter than the reference, but a critical line of dialog is not. A brief explosion that is 10dB higher than the reference is probably OK, but a line of dialog is probably not. This is where the creativity of the program producer and the expertise of the mixer come in to play. The best ones recognize that a 15dB range for dialog or other anchor-like elements might in fact be too much for a typical viewing environment and mix accordingly. Of course, not all content is so lucky.

What to do?

The best solution is to follow the four recommendations above in order, to the extent that each is applicable. Try to get it right from the start if possible. If your facility does not yet support dynamic metadata, the target approach works well, and there are a myriad of feature-packed meters and scaling tools available today both in hardware and software form. Don't be afraid to install safety protection processing at the end to catch any rogue content.

Modern television audio processing can help to manage the loudness range, and it can work much more effectively if average loudness is already consistent. Smart processors will use metadata to their advantage, and hybrid processors will use it to the benefit of the program producer.

Getting this right will take a continuing industrywide effort that should not be driven by regulations and regulators alone. However, it is understandable why both might be inevitable.

Tim Carroll is the president and founder of Linear Acoustic.