Audio processing for HDTV, Part 1

One of the challenges facing broadcasters today is the difficulty of presenting consistent audio in the home, especially with the increase in the number of home theatre systems with surround sound. The issues are many and varied — clicks and pops during transitions between programming, loudness variations between commercials (too loud) and programming (not loud enough), proper signaling of the home audio receiver/amplifier with audio metadata, video-to-audio timing or lip sync issues, and stereo versus surround sound imaging.

Today, several audio processing techniques exist that broadcasters can employ to improve the viewers' audio experience. They include upmixing, downmixing, audio compression, audio metadata, loudness measurement and correction, and lip sync measurement and correction.

This article consists of two parts. Part 1 will examine overriding issues and introduce two key ways to overcome them. The second article will appear in the May issue and continue the discussion about solutions.

The goal is to examine the range of audio processing techniques currently available and see how they can best be used to enhance your viewers' experience. The articles offer tips you can use to address problems that might develop when adding new audio processes into an existing system, such as embedded or separate audio and audio paths, and how you can use video delays to maintain lip sync.

Definitions

Television audio originally consisted of only one channel, known as monaural or mono for short. In the '80s, television began transmitting two-channel or stereo audio. Two-channel audio contains little depth information and primarily limits audio images to a left-to-right placement. Since the late '90s, surround sound 5.1 has been steadily rolling out. If six channels aren't enough to get your ears spinning, 7.1 audio with eight channels of audio is poised to be the next evolution in surround sound technology.

The audio process

In an ideal system, audio is processed in a transparent manner. Levels at the input and output match, and channels are not swapped (i.e., left for right in a stereo signal) or inverted electrically. There is no lip sync issue, as all video paths and audio paths match in time. Voice-over is the only extra audio processing required.

In this perfect world, audio metadata is passed from the source to the final-destination Dolby Digital AC-3 encoder without being lost or altered in any way. This ensures that the original audio mix and level move from the recording studio into the home environment as the audio engineer intended.

Even if this ideal model existed, the audio mixes and dynamic range variations would still occur. News, weather and sports, episodic programming and drama, situational comedy, live performances, and movies are all mixed in different ways. The combination of sources and program types can result in audio chaos if the audio engineer isn't careful.

When it comes to loudness, it is a well-known fact that two audio programs set to exactly the same VU meter level can still result in differing levels of loudness. Once you examine commercials and the variations of mixes and compression tricks used in them, it is a challenge to maintain consistent audio quality from the TV station to viewers' homes.

Even if a broadcaster passes along the program and commercial audio the way the producers intended, the audio that finally reaches the home may be inconsistent.

Any stereo-to-surround and surround-to-stereo transitions will be disconcerting to the viewer because their receivers will repeatedly switch between two-channel to 5.1 channel reproduction according to source material. In addition, this switch is often accompanied with a relay click and an interruption of audio.

Add to this the common concerns about loudness, or volume changes, between programs and commercials, and a TV audio engineer is faced with a daunting challenge.

Fortunately, television engineers have a range of new techniques and technology upon which to draw. Let's examine some of them more closely.

Compression

For contribution purposes among production facilities and for tape and server transports, compression techniques such as Dolby E (carrying eight channels onto two AES paths) are often used to move the increasing number of audio channels around a facility. Dolby Digital, which carries six audio channels on two AES audio paths, is used for surround sound in the home. In both cases, the accompanying two-channel audio signal is non-PCM and therefore cannot be monitored on a speaker system without first being decoded.

Downmix and upmix

Surround sound mixes can be carried on stereo channels. Solutions include Dolby ProLogic II, Neural-THX Surround and SRS Circle Surround. On a surround sound system, the two-channel audio signal can be carried, monitored, stored, edited and played back as a stereo audio signal. However, stereo audio signals played on a surround sound speaker system only appear in the left front and right front speakers.

Although purists could maintain that the program producer intended only a stereo mix, the stereo mix does contain some depth information. That information can be used to recreate a surround sound image through upmixing. Upmixing is the process in which a stereo audio signal is reprocessed to create a surround sound signal.

With upmixing, the center channel is used primarily to carry dialog, yet leaving some of the dialog in the left and right channels. When transitioning a surround mix to an upmixed stereo mix and back, there should be no audible changes, as all of the speakers in the surround sound listening environment are being used. If the perceived resolution in the home environment is enhanced then upmixing presents an improved experience for the viewer.

Downmixing is the process in which a surround sound audio signal is converted into a stereo signal. This might be performed to save audio data space, monitor the signal or move a 5.1 signal through a stereo audio infrastructure.

Downmixing is often used to send a stereo mix to the receiver/amplifier while providing an upmix for surround sound in the home theatre environments. (See Figure 1 on page 33.)

In part 2

You've now been introduced to some of the key issues to consider when handling TV audio. Next month's article will continue the discussion of audio processing with an examination of lip sync, including the causes and solutions.

Randy Conrod is product manager of digital products for Harris.