Skip to main content

When you ask station engineers about the technical issues they face on a day-to-day basis, it seems that the same bunch of audio problems keep cropping up. The top three problems will typically be lip-sync errors, maintaining the continuity of 5.1 and stereo audio and excessively variable loudness levels. The good news is that these annoying and recurrent audio issues can now be convincingly tamed using a mix of highly practical, new technologies that are readily deployed.

Measuring and addressing lip-sync issues

Lip-sync issues are common in broadcast and have their roots in the different processing time required for video and audio content. This difference is even more pronounced with the move to HD and 3Gb/s. Although video equipment is designed to manage the different video and audio delays, lip-sync problems can emerge down the playout chain as signals pass through various devices from different vendors.

Traditionally, it has been difficult to trace the emergence of lip-sync errors during TV playout, and subsequently at the set-top box, while a channel is on-air. Digital fingerprinting, however, now offers an elegant solution to identify, measure and trace lip-sync errors. The technology is based on a comparison of the video between a reference source without any lip-sync problems and other points in the playout chain, where lip-sync problems may emerge due to processing delays. For example, lip-sync testing points may be established at an incoming feed, after branding and closed-captioning/VBI insertion, at the exit of master control and when checking off-air feeds.

Typically, the process is performed using a probing module, which analyzes signals at both points using a nonintrusive fingerprint generator engine. This operates on a field-by-field basis to generate a number that is unique to the video or audio content for that field. With this numeric data, the probe can then make sure that the content is the same at the source and destination. This allows the system to check for content mismatches, such as video and track swaps, as well as pure lip-sync errors. A probe can check all 16 audio channels and report any lip-sync errors within plus or minus 1ms. Every channel will get its own fingerprint to allow measurement of any phase shift between the audio channels. (See Figure 1.)

A key advantage of this digital fingerprinting technology is that it allows content comparison across different video and audio formats. For example, it can compare an HD 1080i or 720p signal with audio from a broadcast facility’s master control room to a signal received in the home using an SD set-top box. (See Figure 2.)

These lip-sync measurements can be monitored over IP, using a WAN or LAN, with an SNMP-based facility monitoring system, and any errors can be immediately flagged for remedial action. This would typically involve the operator making a delay adjustment directly from the facility monitoring desktop interface using the channel’s signal processor. This highly flexible, end-to-end lip-sync monitoring process can be used for multiple TV channels across multiple sites and also is well-suited for applications such as TV network affiliate monitoring. (See Figure 3.)

Digital fingerprinting is still in the early roll-out stage, and it is currently based on proprietary solutions; however, SMPTE has taken note of the considerable potential of the technology and is investigating the possibility of producing a SMPTE standard for the fingerprint signal and the methods of metadata carriage, with the review being performed by the SMPTE TC-22TV-01 AHG Lip Sync Committee.

Maintaining the continuity of 5.1 and stereo

Traditionally, delivering both 5.1 and stereo programming simultaneously has proven problematic. A typical viewer complaint is inconsistent 5.1 delivery to the surround speakers, which is often caused by ineffective upmixing when moving from a 5.1 to 2.0 signal. This can happen when broadcasters are playing out a mix of newer 5.1 content and legacy 2.0 content. While this type of problem may not be a full-on broadcast emergency, it’s certainly not the high-quality acoustic experience broadcasters strive for.

Continue to next page

Fortunately, this can now be addressed with a simple, and relatively low-cost, set-and-forget modification to the playout chain. The latest generation of signal processor modules is equipped with integrated up- and downmixing capabilities, and these can be configured to automatically respond to the incoming signal. Whenever a 2.0 signal is received, it can be passed and also upmixed to 5.1. Similarly, when a 5.1 signal is received, it can be passed and also downmixed to 2.0. These automatic responses prevent problems like inconsistent 5.1 and 2.0 audio.

Loudness control

The last of the three audio problems, controlling excessive loudness variation between channels and between program segments, is perhaps the most high-profile issue at the moment. Loudness variation is especially evident during commercials and promos and quickly gets tiresome for viewers.

This situation has been around for a long time, and you may recall that dialnorm was created to prevent this problem. Dialnorm metadata is designed to allow content to be mixed to different loudness levels and for the audio receivers to compensate for the differences by applying a normalization based on the metadata. Unfortunately, the dialnorm route to loudness control is not practical for many broadcasters because they cannot reliably pass the dialnorm metadata to their AC-3 encoders. This sometimes arises because metadata is missing in the content they receive or because their playout chains incorporate many different generations of equipment, and metadata transmission issues can lead to missing or incorrect values. These metadata errors can make loudness jumps at the home even worse than they would have otherwise been.

There’s real impetus now to fix this problem. The ATSC has published the A/85 recommended practice, which proposes as an alternative to agile metadata the use of a fixed dialnorm value, set at the encoder, to ensure that all content matches that target loudness. However, while this is great for content produced in house, broadcasters tend to receive a lot of their content from third parties, and they need new tools to ensure it matches their target loudness.

To meet this requirement, a number of equipment vendors are now offering loudness control processors, and there are multiple approaches available involving multiband and wideband audio processing solutions. This technology can be delivered as dedicated boxes or as space-efficient options for signal-processing modules. The best solutions now offer very smooth loudness transitions, without sudden dips in level, or pumping effects, which have traditionally been prevalent with automatic gain solutions.

Signal processors with loudness control include features like loudness measurement using ITU-R BS.1770 to assess any deviation from the target loudness, as well as a dynamic processor that can adjust levels on the fly, and a controller that is much more sophisticated than traditional gain control. (See Figure 4.)

The most common way to implement loudness processing has been to use a set-and forget-mode of loudness control, with the loudness processor maintaining a target loudness without any ongoing operator involvement. To optimize the processing, facilities can choose a processing profile that’s the best match for their type of content. Profiles are available for music, talk and many other types of programming.

Set-and-forget operation can deliver great results, and it’s easy to install and use on a daily basis. However, for some specific types of channels, a more active style of control can work even better. This is especially true for channels that air movies and drama programs, where large and rapid loudness changes contribute to the dramatic effect. With set-and-forget operation, the loudness processor has no way of knowing the difference between a sudden audio transition within a program and an audio transition caused by a change of segment, such as a commercial break.

In these situations, it can be beneficial to use segment-aware processing with the loudness processing profile controlled by a simple segment change cue from playout automation. For known segments with the correct loudness level, the loudness processing can act in a bypass or light processing mode, which can help protect against downstream clipping. For commercials, live segments and feeds from outside the facility, the loudness processing can act in a faster-reacting correction mode.

Playout automation-driven loudness processing can also be advantageous for network affiliates, which need to pass preprocessed network content as well as locally created news content and commercials. In this case, the loudness processing can be bypassed when the network feed is passed to avoid any changes to the content, and it can be engaged for the local content with uncontrolled loudness levels. The net result can be a natural, high-quality audio experience free from excessive loudness variation. (See Figure 5.)


All those pesky lip-sync, loudness and 5.1 continuity issues can now be addressed effectively using relatively simple fixes and without too much investment. All these audio issues should become a thing of the past before too long.

Guy Marquis is infrastructure senior product manager for Miranda Technologies.