Managing HDTV sound

Proper management of 5.1 surround-sound audio is the latest challenge facing broadcasters as they continue to enhance the overall HDTV experience in the home. In many ways, managing the rollout of 5.1 audio is proving to be more difficult for engineers than the HDTV rollout itself.

Defining audio

To help define the terms used today in audio processing, Figure 1 lists typical audio types (including a future 7.1 surround-sound scenario), and Figure 2 lists the typical audio conversion types (encoding, decoding, upmixing and downmixing).

The process of compressing audio into a non-PCM compressed signal is typically called encoding; when it is decompressed back to PCM, it is called decoding. A downmixed signal may or may not have additional inaudible information that can be used for a downstream upmix. Downmixed signals are monitored in stereo unless the signal needs to be upmixed for monitoring purposes (Pro Logic Iand II, DTS Neural Surround), and non-PCM signals sound like white noise when monitored.

Monitoring audio

The majority of “true” 5.1 audio is provided by tapes or ingested via satellite in today's HDTV systems. One of the first challenges is the monitoring of the audio. Options include downmixing the 5.1 surround sound or setting up a 5.1 amplifier and speakers. There is also the issue of how to monitor a mono mix of the stereo signal, as well as Dolby E, Dolby Digital AC-3 and Pro Logic II signals as they are encountered. All of these scenarios have to be carefully considered.

Audio provided as three PCM channels, either as AES or embedded into the SDI signal, can be downmixed for monitoring in stereo or in surround sound. A downmix will be provided in most cases (5.1 + 2.0) and is required in the monitoring chain. Audio that has been compressed using Dolby E technology must be decompressed. However, this can lead to lip-sync issues because the compressed non-PCM audio may or may not be prealigned for the one frame of delay apiece incurred during compression and decompression. The picture monitor also can impart its own delay, which may or may not help the lip-sync timing.

If the audio was compressed into Dolby Digital AC-3, decompression is not recommended, because this signal is meant to be decompressed once in the home environment. If recompressed, artifacts will result in lower quality.

Pro Logic also may need to be upmixed for monitoring. Figure 3 shows the various types of monitoring required.

Lip sync

Lip sync is proving to be another issue in 5.1 audio delivery. It has been said that today's larger screen sizes make it easier to see lip-sync issues. If this is true, lip sync becomes even more apparent in HDTV and surround sound. A broadcaster or cable/satellite distributor can control lip sync up to the point it is decoded in the home environment, and many lip-sync errors can be predicted, measured and repaired using today's technology. There are methods of placing a known video and audio test signal, passing it through a system and then offsetting to the measurement.

Processing audio

Another pressing issue is what to do if 5.1 surround sound is not provided within the content. There are two methods with which to address this issue. The purist's method is to continue to move stereo audio into the home environment. However, dynamic audio metadata is used to signal the receiver in the home for the 2.0 stereo or 5.1 surround-sound content. This is known to cause clicks, pops, muting and other issues in the home, because not all receivers react to audio metadata in an inaudible fashion.

The other method is to upmix the audio from stereo to 5.1 surround sound. It is important to perform listening tests to determine whether the upmix sounds natural in the surround-sound domain. Also, listen for any audible artifacts when the input transitions between 2.0 and 5.1. Upmixing provides a constant surround-sound experience in the home environment through the use of static audio metadata.

Moving audio in today's facilities can be simple if the audio is embedded. It becomes more complicated if AES interfaces are used. Today, 75V unbalanced interfaces are mostly used, with occasional use of balanced 110V interfaces. Existing embedded audio equipment handles all four groups of embedded audio (16 total channels, four per group), but interfacing to older embedded audio equipment can cause problems because all four groups may not be handled.

Even if an older-generation embedder/de-embedder handles two groups, the other two may not be identified properly. This can cause issues with newer embedders/de-embedders that can handle four groups of audio. A guard band for embedded Dolby E must be implemented so there is an alignment between the video and Dolby E header for proper de-embedding downstream relative to the video. Also be aware of old de-embedders in a system because they may not de-embed the audio phase aligned across the three PCM channels for surround sound. This will affect the surround-sound audio.

If interconnection is considered a minor annoyance, then the mapping of the audio across many channels can be downright problematic, because an industrywide standard does not exist. TV networks have different ways of processing and mapping audio. Equipment that can easily be routed for audio processing is necessary whether the interface is AES or embedded, and using detection can assist with the different mappings. PCM and non-PCM can be detected, but identifying Dolby E versus Dolby Digital can be difficult.

Other than the 5.1 + 2.0 scenario that exists for stereo and surround sound in today's content, there is also secondary audio information that must be considered, which may be another language or descriptive video. Stereo or an upmix can be provided in cases where 5.1 is not present; however, handling the secondary audio program requires additional audio processing. If no descriptive video audio content exists, it is typically substituted by the stereo signal or a sum of the stereo signal.

5.1 in the home environment

Even if the audio is provided as 5.1 in the home environment, proper setup of the home receiver is necessary to ensure the best possible sound. Speaker placement and level alignment are important, as is the setting of the Dynamic Range Control (DRC). This will eliminate the need to reach for the volume button on the remote control every time the audio is too loud or soft.

Conclusions

There are many considerations for audio processing in today's 5.1 systems. Accommodations for monitoring the various forms of audio must be made. Lip sync will continue to rear its ugly head as systems become more complex, so a solid understanding of where lip sync can go wrong is critical.

The procedures for mapping the various channels of audio at the input and output of a system must also be well understood, and it is critical that processing be in place to handle every scenario. Audio can be transported as AES or embedded, PCM or non-PCM, and stereo or surround sound, and there may be other content that needs to be considered such as a secondary language or descriptive video.

The complications that may arise from either the purist's or nonpurist's methods of upmixing stereo must also be understood, because there may be audio metadata and upmixing issues that create unnatural sound. And, of course, correct home receiver setup is critical for the best possible surround-sound experience.

Randy Conrod is product manager of digital products for Harris Broadcast Communications.