Surround sound in the digital era

Surveys show that good surround audio makes even SD images look “better.” Today’s viewers expect their audio to be as good as the video
Publish date:
Social count:

In 2001, Tokyo-based Half HP Studio opened its first surround sound facility and digital video editing room. The sound production company employs a Solid State Logic C200 digital production console in its main audio facility, Stage 1.

It's important from the outset to stress that much of what is said about surround sound is difficult to understand. In looking into that, the reason that often emerges is that someone, somewhere, is trying to violate the laws of physics. We are presently with surround sound where the industry was with stereo in the 1960s. Back then, we had the same situation inasmuch as much of what was said didn't make sense, and many of the products were disappointing, but eventually the industry figured out how to do it.

Quadraphonics went straight on its face, primarily because the technology didn't exist to deliver four channels through analog media without quality loss. Subsequently, Dolby would develop a surround system that worked well with two-channel analog media to create ambience at the rear.

But for true surround sound, with an arbitrary number of totally independent channels, the only real solution is digital audio. Once audio is digitized, any number of channels can be multiplexed into a bitstream without crosstalk. That bitstream can be an interface in a production environment, or it can be an MPEG transport stream for program delivery. In a workstation, it's not much harder to manipulate multiple channels than it is to handle stereo. The same is true of file servers. All disk-based audio recorders lock the audio data to time code, so it is not harder to have more channels. All that is needed is more storage capacity and more processing power in proportion. That is exactly what computers give us every year. Thus, the actual nuts and bolts of producing and delivering multichannel sound is relatively simple. What is not so simple is defining what the channels should do.

Cinema vs. television

When high-definition television began to emerge as a concept, it became clear that if the picture was going to be better, then the sound should also improve. It was a natural step for HDTV to adopt surround sound. As HD requires more video bandwidth, a bit more audio bandwidth could readily be found.

So far, the reasoning can't be faulted. However, what happened next wasn't reasoning. Someone somewhere simply assumed that the way the movie industry did surround sound was how broadcasters should do it. As it turns out, it was a false assumption.

When the cinema started to use multichannel sound, it wasn't stereo as we know it. Rather, it was just more channels feeding typically three speakers behind the screen. A given sound would be routed to one of the three channels in production. Cinemas are large places with a significant number of seats way off-axis. The front left and right speakers will be designed for power output rather than stereo imaging capability. The center speaker pulls the image back to the screen. Cinema screens are acoustically transparent, so it's easy to put a speaker behind the center of the screen.

Unfortunately, most television displays aren't acoustically transparent, and only a small number of people watch. Thus, the center channel speaker just adds cost and complexity, as there is no need for it in HDTV. If the front left and right speakers are any good, any center channel signal can be reproduced just by feeding it equally to left and right speakers. Consumer manufacturers seem to be aware of this, because surround sound decoders can be set up to do just that.

In surround sound, the equivalent to the audio vectorscope is the jellyfish display, such as the one shown here from DK Technologies.

You can explain this until you're blue in the face and no one believes it, so I have taken to walking off with the center speaker in the middle of surround demos to illustrate the point.

4.0 surround

Let us consider the lowest frequencies. This involves the sixth channel. Because it has much lower bandwidth than the other channels, the system is called 5.1. That sixth channel is called Low Frequency Effects (LFE). This channel does not come out of any known surround microphone technology, because the LFE signal in movies is entirely synthesized in post-production.

LFE is designed to operate massive woofers in cinemas to scare the audience during earthquakes and explosions. The sounds made do not resemble those that can actually be heard during these events, or indeed the waveform on the film, but that does not matter. Unfortunately, the speakers needed to deliver cinema SPL at these low frequencies are too big for the average home. Thus, for HDTV sound, the LFE channel is quite unnecessary. Provided the remaining channels have full bandwidth, the user can have whatever speakers he wishes, and the LF can have directional information that LFE, being a single channel, cannot.

Thus, selecting 5.1 for HDTV was a mistake because the center channel and the LFE channel are unnecessary. What HDTV really needs is a properly engineered 4.0 system. As mentioned above, four digital audio channels is easy. It's all you need, and in many cases, it's more realistic than 5.1, especially for drama and music. When it is considered that the delivery mechanism will use compression, probably MPEG AAC, it should be clear that the fewer the audio channels, the better the quality will be for a given total audio bit rate. It is also worth remembering that most of the masking models used in compressors are based on mono. When sounds are in different places in a stereo or surround image, the masking isn't as strong and a higher bit rate will be needed.

There's no reason not to mix 4.0 surround and simply leave the center and LFE channels muted. A center sound will be panned equally in the front left and right signals. Low frequencies will come from the appropriate speaker, unless the viewer has small speakers, in which case his surround decoder will be set to filter off the LF from the four channels, add it up and put it in the subwoofer. All surround sound decoders can do that.

Monitoring surround

Whilst we are considering hard truths, it should be understood that there is no way to create a virtual sound source outside a pair of speakers. Thus, surround sound is a misnomer. It's stereo at the front, with sounds positioned anywhere between front left and front right, and stereo at the back, with sounds positioned anywhere between rear left and rear right. With four or five channels, you can't pan sounds to the side. You can try, but it doesn't give the illusion of a sound source there. The only way to get sound from the side is to use reverberation. That's why many rear speakers are designed to emit sound in more than one direction.

So it follows that you can't mix surround sound in a traditional dead control room because you won't hear anything from the side at all. Instead, the control room needs to have some reverberation, preferably representative of a domestic environment, or the viewers will have a completely different experience.

Whilst surround monitoring in a fixed installation does not present any serious problems, the real difficulty will come with outside broadcast vehicles. These seem to be getting smaller, and it is difficult enough to monitor stereo — let alone multiple channels.

CSI, a live sound specialist in Tokyo, installed a Solid State Logic C200 digital production console in one of its mobile recording vehicles. The company selected the console because of its 5.1 capabilities.

Surround sound needs a different approach to level metering. Whilst it is possible simply to use a large number of conventional meters, this isn't as intuitive. In stereo, the audio vectorscope display was useful as it gave a graphic representation of where the dominant sound was coming from. In surround sound, the equivalent to the audio vectorscope is the so-called jellyfish display. This is a blob whose diameter increases with level, and whose border distends in the direction of the dominant channel.

For the foreseeable future, possibly even forever, there will be viewers listening in stereo or, shock horror, mono. It is important to realize that a stunning surround mix may be reduced to an unintelligible cacophony when heard in stereo or mono. Thus, it is important during the mixing process to check what the stereo and mono versions sound like.

Another disparity between movies and TV is the amount of money available for post-production. Movies can rebuild the soundtrack entirely in post, using ADR and effects. Thus, sounds can be panned to any desired channel. In TV, the sound has to come from microphones. One approach that has a lot of merit is to record the four raw outputs of a sound field microphone. Using the sound field control box, the best directivity and direction of the virtual microphones can be determined on replay.

In many situations, the dominant sound source is going to be at the front. In the theatre or concert hall, the performers are typically in front of the audience. In cases like this, stereo microphones can be used to capture two channels, and the rear channels can be created entirely artificially using suitable digital reverberators. Surprisingly enough, it's virtually impossible in such an application to tell the difference between artificial rear channels and real ones from rear microphones. It wouldn't be surprising if a lot of TV sound ended up done this way.

John Watkinson is a high-technology consultant and author of several books on video technology.