The 5.1 Challenge

The roadmap to surround sound has many detours

NEW YORK

5.1 surround sound is an important element of the HDTV viewing experience, adding excitement to a movie or immersing viewers in a sporting event.

Discrete 5.1 audio refers to three front channels--left, center, right (L, C, R); two rear (surround) channels--left and right (Ls, Rs); and one lower bandwidth (in terms of bits used for coding), low-frequency effects channel (LFE). A consumer with an AC-3 decoder can produce all these signals from a full 5.1 source.

DOWNMIXING

Unfortunately, for audio mixers, there are only a small percentage of viewers actually listening in discrete. Many folks may be listening to surround sound, but it is surround sound that has been matrixed to two channels and then decoded back to surround.

The type of surround in these circumstances can vary depending on the encoder/decoder used. Until recently, it was common to create a mono and band-limited surround channel that was connected to one or two surround loudspeakers. Newer encoder/decoders provide separate full-bandwidth left and right surround, along with the usual front left, center and right signals. But none so far, produce the low-frequency effects channel.

Even with surround encoding/decoding technology readily available, many, if not most, viewers hear 5.1 not as surround sound at all, but downmixed to either stereo or mono.

The 5.1 audio mixer for TV, then, can't just create a discrete channel mix and leave it at that. As stereo mixes need to be auditioned in mono to ensure compatibility (no out-of-phase or out-of-polarity surprises), 5.1 mixes need to be checked, not only in discrete, but in mono, stereo, matrixed left and right (also referred to as left-total (Lt) and right-total (Rt)), and matrix-decoded.

"The problem is that a lot of things that sound artistically acceptable in discrete 5.1 will totally trash the surround sound decoder, and even worse will trash the stereo and mono," said Eric Small, chief technical officer of Somerset, N.J.-based Modulation Sciences, Inc.

Here are some tips to help avoid these compatibility problems.

Be careful about positioning something in full surround as that could make the downmixed stereo sound strange. Crowd noise needs to be listened to carefully. Out-of-phase components can create unexpected and possibly unwanted surround signals.

Treat the Lt and Rt signals with same good engineering practices as stereo. The signal paths should be identical, with the same high-frequency response, and low distortion, and no clipping.

Any compression on the Lt and Rt channels should be cross-coupled. If not, the stereo will be degraded and, as Small said, the surround will make you seasick.

Doing subtle things in discrete mixing can end up being heard as unsubtle changes when decoded. Many of the new smart de-coders tend to snap a signal to a particular channel.

Remember that the Lt/Rt downmix doesn't contain any of the LFE channel. So if you want enough bass to be reproduced on a matrixed, stereo or mono system, make sure to add it into those channels, and not just reserve it for the LFE channel.

Of course, sometimes rules are meant to be broken by artists who understand what they are doing, backed by their years of experience in mixing 5.1 for TV.

Take, for example, Emmy-award winning audio consultant Fred Aldous, who is the senior sound mixer for Fox Sports.

Aldous tends to reserve the LFE for those with discrete 5.1 systems, giving them that something extra.

Generally, stereo synthesis should be avoided because the comb filtering, phase shifting and other processing needed to produce a more spacious two-channel soundfield can wreak havoc with surround decoders. What may sound fine in stereo can be an aural imaging nightmare in surround.

But Aldous judiciously adds in a stereo synthesizer on the handheld camera mic mix (each handheld has only one mic) to widen the sound a bit and prevent the mix from collapsing to the center channel. He assigns the output from the stereo synthesizer to the front left and right channels only, and doesn't synthesize them so wide as to affect the surround.

Aldous offered a couple other tips: "Make sure announcers are in the center and mix everything else around it," he said.

Make sure that the center channel doesn't get lost in the Lt/Rt mix. The most common setting for matrixed downmixing is to attenuate the center channel by 3 dB. If there's too much going on in the other channels, the center channel can easily be masked.

Aldous also upmixes stereo sources, like videotaped highlight reels, to synthesized 5.1 with a Dolby DP564. The goal is to keep the soundfield consistent.

George Craig, audio specialist for KTVU in Oakland, Calif., said his station will use a Linear Acoustic upMAX 2251 for upmixing the station's stereo programs and commercials to be compatible with Fox Sports 5.1 baseball games.

Is upmixing Lt/Rt compatible? Aldous said yes, but he only upmixes once to avoid artifacts.

Tim Carroll, president of Morris Plains, N.J.-based Linear Acoustics said, "it is completely compatible, and sounds reasonably good. Do a good stereo mix that is also mono-compatible and it will upmix fine. Upmixing is a necessary evil for now.

The world would be great if everything was 5.1 channels, or if switching the AC-3 encoder between 5.1-channel and two-channel modes didn't make glitches in millions of decoders, but for now this is not true."

The worst-case scenario, and something to generally avoid, is to upmix a program, then downmix and upmix (matrix decode) again.

Different audio mixers have developed their own routines for monitoring the different audio formats for compatibility.

Aldous said that for the first 10 to 15 minutes of a show, he listens to the Lt/Rt downmix created by a Dolby Pro Logic II encoder on the truck. "That's the mix that most of the world will hear," he said.

After that he switches to discrete 5.1 monitoring "to make sure the 5.1 mix is where it should be." With 5.1 and various downmix audio formats available on the monitor section of the audio console, Aldous is able to check among them for compatibility.

With all that goes on behind the scenes during live television, switching from format to format is not as easy to do as in post production.

TOOLS OF THE TRADE

A number of tools of the trade make life easier. There are 5.1 phase scopes and level meters from companies like DK Audio, Leader Instruments and Wohler Technologies, which indicate if the discrete soundfield is within proper parameters.

SpiderVision from Modulation Sciences, Inc. (and developed with Neil Muncy) does something different in that it takes in the Lt/Rt feed, produces a visual display of a matrix-decoded soundfield, and shows where compatibility problems occur.

Aldous is a big fan of SpiderVision. Even when he's mixing in 5.1 discrete, he is able to keep an eye on the matrixed feed. Aldous also uses SpiderVision in his home listening room, where he often critiques his mixes from a recording off his cable system.

Getting audio metadata right means that the listener will hear the sound as it was meant to be. Make sure metadata follows the program all the way through. If the Dolby Digital (AC-3) encoder is set to external, and external metadata disappears, the encoder could revert to internal presets, which may or may not match the program. If a 5.1 program is suddenly re-coded as a 2/0 program, only the front left and right channels will be present, with all center channel dialogue lost.

When in doubt, use the defaults. As Carroll said, "you have a better chance of getting it right leaving the AC-3 encoder in default than passing actual metadata."

Generating correct metadata isn't as daunting as it may seem, although ensuring that it follows the program can be. Dolby-E helps in this respect as the metadata is carried along with the bit-rate-reduced, multiplexed eight-channel audio on an AES pair.

The Dolby DP570 multichannel audio tool determines, among other parameters, the all-important dialogue normalization (dialnorm) value. The DP570 outputs a datastream, and the metadata values derived from the unit can also be manually entered into an AC-3 encoder.

A new version of the Linear Acoustic LA-5124 StreamStacker De-Multiplexer/AC-3 Splicer includes AutoNorm that measures long-term dialogue loudness per ATSC standards, and inserts the correct value into one or more pre-compressed Dolby Digital (AC-3) streams. With this system there is no need for decoding and re-encoding or for manually setting audio metadata parameters. The first version of AutoNorm uses the Dolby LM100 loudness meter.

Creating a compelling 5.1 audio broadcast is a challenging, but rewarding endeavor. For someone new to this, allow the time to critically listen (in a well-designed listening environment) to various programs in all their formats, then experiment, listen again, use the tools and do more listening, but have fun.