Getting loudness under control

Television viewers have long complained about the audio level variance between channels, and from segment to segment. It has become the norm for viewers to have their remotes close to hand for fine-tuning the volume. Unfortunately, the transition to HD and 5.1 audio appears to have exacerbated this loudness problem. The main issue here is not the increased channel count from two channels for stereo to six channels for 5.1. Instead, it is all of the additional technology required to receive, process and deliver 5.1 audio throughout the broadcast chain.

Thankfully, there are a range of practical options for broadcast engineers to address this loudness problem. And they are becoming easier to adopt thanks to the recent introduction of simpler and more integrated signal processing technology. For instance, until recently, it was normal for an HD signal processing path to comprise multiple discrete processing modules and dedicated devices. All this equipment was needed for tackling video and audio conversion, muxing, upmixing and encoding tasks. Now, this chain of equipment can be replaced with a single card, thereby considerably reducing costs and complexity. It is now possible to integrate all the necessary 5.1 processing functionality into a single module, without compromising on functionality and quality. (See Figure 1.)

Before looking at the different approaches available for controlling loudness, it's worth examining some of underlying issues behind the problem.

Managing audio levels

From the outset, the Dolby Digital delivery system included a built-in loudness control mechanism. This feature, commonly known as dialog normalization, was incorporated to allow broadcasters to operate at different loudness levels, with the differences in loudness managed in the home by the Dolby Digital decoder.

The general idea was that all program content would include a metadata parameter, called dialnorm, indicating the nominal level of the dialog in the program. The dialnorm value is presented to the Dolby Digital encoder, which then sends it to the home decoder, where it is received and applied to the decoded audio signal in order to modify the level of the program. As the viewer changes from channel to channel, or as the broadcaster switches from segment to segment, the associated dialnorm value is used by the audio decoder to dynamically adjust the overall audio level so that consistent loudness is maintained. (See Figure 2.)

However, instead of helping to iron out interchannel and intersegment loudness issues, the dialog normalization feature has often made things worse. This is generally the result of inconsistencies in the dialnorm setting during program creation, or when video is received and processed by a facility. It may happen when the content is first created if the value is not set properly, or during incoming feed processing or ingest, when audio metadata is removed by equipment or the wrong value is applied.

Similarly, during production or in master control, the dialnorm values may not be set properly (left at default), or the audio may be modified but the dialnorm values are not properly readjusted. The net effect of these many different possibilities is that the audio levels heard by the viewer are often adjusted using incorrect dialnorm values and will therefore be unstable and inappropriate. (See Figure 3 on page 32.)

The three most common solutions to this audio loudness problem are: dynamic metadata, static metadata, and loudness measurement and on-the-fly audio level control.

Dynamic metadata is the original solution. It relies on a combination of good production practices, making sure that all content created or ingested has the proper dialnorm value, and good equipment design, which ensures that metadata is maintained throughout the facility and is delivered to the emission encoder.

Although this model has recently become more effective due to the soon to be ratified SMPTE-2020 standard, which specifies a way of transporting Dolby metadata in the ancillary space of HD- and SD-SDI video, it is still not practical for many established facilities. This is because broadcasters must ensure that all the audio processing used in their facility passes and adjusts audio metadata if the audio content is modified. Unfortunately, the reality is that many HDTV facilities use a large number of devices that were designed and deployed before SMPTE-2020 existed, making it impossible to ensure metadata survival throughout the chain.

Static metadata is at the other end of the range of possibilities. This model involves producing all content to a known dialog level and setting the corresponding dialnorm value statically at the Dolby Digital emission encoder. This requires close collaboration with external content providers to make sure all the content provided is mixed to the target loudness level. For non-live content delivered on tape or file, it is possible to measure the loudness of the entire program and reprocess the audio to make sure that it meets the target dialog level. But for live content or content delivered in real-time as a stream, it is not possible to reliably perform this task because dialog level is a function of the entire program or segment, and the entire program or segment must be received before the dialnorm value can be accurately known.

Loudness measurement and on-the-fly audio level control is the third possibility. This process ignores incoming metadata and adds a device at the end of the chain that measures the program loudness, and either sets the dialnorm value accordingly or processes the audio to meet the target dialnorm value. (See Figure 4.) This scheme is roughly equivalent to an automatic gain control (AGC) on the output of a facility, using loudness as the control signal. However, because loudness is a measurement that is supposed to be integrated over a long period of time (ideally the entire duration of the program or segment), this method can be problematic.

For example, in a quiet scene, the loudness level — even the dialog loudness level — could be temporarily low. If this low level is used as a cue to increase the audio level, and if the quiet scene is followed by a louder scene, then the audio loudness increase will be amplified and will likely result in an abnormal and undesirable rise in level. Hence, applying the loudness control too quickly will result in a pumping of the audio levels. All of this goes against the original intent of the dialog normalization and dynamic range control of the Dolby Digital format.

Double checking with loudness monitoring

Ultimately, the best solution to loudness problems will depend on the individual facility, and especially on the level of control a broadcaster has on incoming content, particularly live or streamed content. In all three of these solutions, it is recommended that dialog loudness is continuously measured and compared to the outgoing dialnorm value, be it static or dynamic.

There have long been dedicated devices for measuring loudness, such, although they add to the complexity of the channel chain. Hence, several broadcast equipment manufacturers have started integrating loudness measurement inside other devices that are commonly present in the output chain. These products can provide continuous measurement of audio loudness and extraction of the dialnorm metadata value.

Conclusion

It's clear then that using industry standards, such as Dolby Digital and SMPTE-2020, are helping to improve interconnection and operation across multiple audio devices from multiple vendors. However, a more effective solution will arise from using simpler video and audio processing infrastructures, based on fewer devices. As facilities start to deploy more streamlined audio processing, there are likely to be fewer loudness control issues. (See Figure 5.) Ultimately, this should result in fewer complaints about audio levels.

Gilbert Besnard is director of infrastructure at Miranda Technologies.