Audio FOR HD: Common problems, simple cures

Time and time again multichannel audio, rather than video, causes the most problems when transitioning to HD. Audio is a common cause for both audience complaints and technical difficulties at television facilities. Two of the most common complaints from HD viewers are:

audio level variance between channels, and variation from segment to segment; and
inconsistent 5.1 presence, with surround sound appearing and disappearing between channels and between program segments.

Let's look at the causes of these issues and then review several solutions.

Managing audio levels

A promising feature of the Dolby Digital (AC-3) delivery system is the built-in loudness control mechanism. This feature, commonly known as dialog normalization, was created to allow broadcasters to operate at different loudness levels, with the differences in loudness managed in the home by the Dolby Digital (AC-3) decoder. The general idea was that all program content would include a metadata parameter called dialnorm, indicating the nominal level of the dialog in the program. The dialnorm value is presented to the AC-3 encoder, which then sends it to the home decoder. There, it is received and applied to the decoded audio signal in order to modify the level of the program. As the viewer changes from channel to channel, or as the broadcaster switches from segment to segment, the associated dialnorm value is used by the audio decoder to dynamically adjust the overall audio level so that consistent loudness is maintained. (See Figure 1.)

However, it seems that instead of helping to iron out interchannel and intersegment loudness issues, the dialog normalization feature has often made things worse. This is generally the result of inconsistencies in the dialnorm setting during program creation or when video is received and processed by a facility. This may happen when the content is first created, if the value is not set properly, or during incoming feed processing or ingest, when audio metadata may be removed by equipment or the wrong value applied. Similarly, during production or in master control, the dialnorm values may not be set properly (left at default), or the audio may be modified but the dialnorm values are not properly readjusted. The net effect of these different possibilities is that the audio levels heard by the viewer are often adjusted using incorrect dialnorm values and will therefore be unstable and inappropriate. (See Figure 2.)

The three most common solutions to this audio loudness problem are dynamic metadata, static metadata, and loudness measurement and on-the-fly audio level control.

Dynamic metadata

This solution relies on a combination of good production practices. It makes sure that all content created or ingested has the proper dialnorm value, as well as good equipment design, which ensures that metadata is maintained throughout the facility and is delivered to the emission encoder. This model has recently become more effective due to the soon to be ratified SMPTE 2020 standard, which specifies a way of transporting Dolby metadata in the ancillary space of HD-SDI and SD-SDI video.

However, it is still not practical for many facilities. This is because broadcasters must ensure that all the audio processing used in their facility passes and adjusts audio metadata if the audio content is modified. Unfortunately, the reality is that many HDTV facilities use a large number of devices that were designed and deployed before SMPTE 2020 existed, making it impossible to ensure metadata survival throughout the chain.

Static metadata

At the other end of the range of possibilities, this model involves producing all content to a known dialog level and setting the corresponding dialnorm value statically at the Dolby Digital (AC-3) emission encoder. This requires close collaboration with external content providers to make sure all the content provided is mixed to the target loudness level. For non-live content delivered on tape or file, it is possible to measure the loudness of the entire program and reprocess the audio to make sure it meets the target dialog level. But for live content, or content that is delivered in real-time as a stream, it is not possible to reliably perform this task. This is because the dialog level is a function of the entire program or segment, and the entire program or segment must be received before the dialnorm value can be accurately known.

Loudness measurement and on-the-fly audio level control

The third possibility is to ignore incoming metadata and add a device at the end of the chain, which measures the program loudness and either sets the dialnorm value accordingly or processes the audio to meet the target dialnorm value. (See Figure 3.) This scheme is roughly equivalent to an automatic gain control (AGC) on the output of a facility, using loudness as the control signal.

However, because loudness is a measurement that is supposed to be integrated over a long period of time, ideally the entire duration of the program or segment, this method can be problematic. For example, in a quiet scene, the loudness level and even the dialog loudness level could be temporarily low. If this low level is used as a cue to increase the audio level, and if the quiet scene is followed by a louder scene, then the audio loudness increase will be amplified. This will likely result in an abnormal, undesirable rise in the level of audio.

Hence, applying the loudness control too quickly will create a pumping of the audio levels. All of this goes against the original intent of the dialog normalization and dynamic range control of the Dolby Digital format.

Double-checking with loudness monitoring

Ultimately, the best solution to loudness problems will depend on the individual facility and especially on the broadcaster's level of control regarding incoming content, particularly live or streamed content. In all three of these solutions, it is recommended that audio loudness is continuously measured and compared to the outgoing metadata value, be it static or dynamic. (See Figure 4.)

There have long been dedicated devices for measuring loudness, although they add to the complexity of the channel chain. Hence, some broadcast equipment manufacturers now integrate loudness measurement inside other devices that are commonly present in the output chain. These products provide continuous measurement of audio loudness and extraction of the dialnorm metadata value. The measured loudness and the extracted dialnorm value are compared against each other and against configured operational targets. Then an alarm or warning is reported if the values diverge too much for a specified time duration, thereby offering improved audio control.

Maintaining 5.1 channel continuity in master control

Let's now consider the second issue of integrating legacy stereo material and 5.1 surround-sound material. Early HDTV broadcasters initially elected to air a mixture of stereo and 5.1, depending on the original material. This resulted in viewer confusion and dissatisfaction as the program switched in and out of 5.1 surround, depending on the show segment. (See Figure 5.)

An an alternative, the broadcaster can send 5.1 surround sound to viewers by employing a technique often referred to as upmixing. This allows stereo material to be converted to 5.1 by synthesizing the center, surround and LFE channels. (See Figure 6.)

The criteria for a good upmixer include:

Dynamic adaptationThe upmixer automatically adjusts the synthesis algorithm depending on the program content (dialog, action and music).
Auto-sensingIt automatically detects whether the input is 5.1 or stereo, and it seamlessly and silently switches the synthesis in and out as required.
Downmix-compatibleThe upmixing is designed for homes with surround-sound listening configurations, as well as those with stereo televisions. In homes with stereo audio, the incoming 5.1 signal is converted back to stereo, or downmixed, by the decoder or receiver. Hence, it is imperative that the synthesized 5.1 is compatible with an eventual downmix in stereo homes.

Conclusion

The main reason that multichannel audio has proved so demanding for facilities is not necessarily because of the increased channel count from two channels for stereo to six channels for 5.1. It is mainly because of all the additional technology required to receive, process and deliver 5.1 audio throughout the broadcast chain. Using industry standards, such as Dolby Digital (AC-3) and SMPTE 2020, has helped to improve interconnection and operation across multiple audio devices from multiple vendors. However, a more effective solution is to use fewer devices, thus reducing costs and simplifying systems. (See Figure 7.)

Looking ahead, we can be optimistic that there will be fewer troublesome HD audio issues, as facilities start to deploy more advanced audio processing technologies. It should all add up to fewer headaches for engineers and fewer complaints. Now that's progress!

Michel Proulx is the chief technology officer for Miranda Technologies.