Last month, I promised I would discuss how we can get the most out of our studio loudspeaker monitors. This is where reality sets in and theory turns to practice.
| Fig. 1: The “classic” stereo configuration, with the listener and the two loudspeakers, form an equilateral triangle. The loudspeakers are 30 degrees off-axis from the median plane. All images from “Total Recording,” courtesy of the author.|
A very basic part of that reality has to do with the idea that we use two or more loudspeakers to represent our audio, which we like to think of as stereo or multichannel. A related reality is that most of our audio is, in fact, mono. You’d think it’d be easy to get mono from more than one loudspeaker. In fact it’s not all that hard. However, here are some peculiar rules to the game, and it is useful to your professional advancement to keep them in mind.
You may have heard the terms “median plane” and “sweet spot.” They relate to some very important issues, and are central to effective critical listening for audio production work. Listen up!
The median plane is a virtual plane between two loudspeakers. All points on the surface of that median plane are equidistant from both loudspeakers. Got that? Being on the median plane means being equidistant from both loudspeakers. See Fig. 1.
And here’s the straight skinny: for stereo production work, the only viable place to be for critical listening is on the median plane between the left and right loudspeakers. If you are off the median plane by more than, say, 4 inches, your perception of the stereophony and phantom images being played back by those loudspeakers will be badly skewed laterally, to a point where you will be unable to make a number of mix decisions successfully.
When you are working in surround, you are surrounded by five loudspeakers. You now need to be in the Sweet Spot, which is a point (not a plane), equidistant from all five loudspeakers (or, more to the point, equi-timed from all loudspeakers). Again, if you aren’t in the sweet spot, you can’t produce 5-channel audio very successfully—mix errors can and will accrue.
| Fig. 2: A quick and dirty description of why we perceive a phantom image.|
PAN POTS, TIME DELAY & PHANTOM IMAGES
The reason for all of this has to do with a psychoacoustical behavior called the Precedence Effect. When we hear a phase-locked pair of identical signals coming from different directions, we localize that pair as coming from the direction of the first arrival—hence the term “precedence” effect. Further, we don’t perceive the later arrival as a distinct arrival, but instead we integrate it with the earlier signal (assuming it arrives within 40-50 ms). Finally, and most importantly, when that phase-locked pair of identical signals arrives at our ears at exactly the same time from two different directions (something that can’t happen in pre-technological nature), we perceive that pair as a sound emitting from a point in space between the two physical sources. We call this, of course, the Phantom Image. Why this should be so is suggested in Fig. 2. However, it is fundamental to our hearing mechanism and ability to localize sound sources in reverberant spaces.
When the time difference is very slight, say half a millisecond, the phantom image will appear to be closer to the earlier source than the later source, but still in between the sources. Suffice it to say, we are very sensitive to time differences. See Fig. 3.
| Fig. 3: Where the phantom image appears for a listener on the median plane as the relative time delay at the two loudspeakers is varied, as with a delay line. Fig. 4: Where the phantom image appears for a listener on the median plane as the relative amplitude at the two loudspeakers is varied, as with a pan pot.|
Meanwhile, in audio we use pan pots to distribute energy between two loudspeakers, in the expectation that if we reduce the level in, say, the right channel the image will move toward the left channel. No time delay is involved, only amplitude.
Sadly, pan pots don’t work very well except for left center and right, on the median plane. We aren’t very sensitive to amplitude differences—what happens is that there is essentially no image shift until there’s about a 4 dB difference. Then, fairly quickly, the image jumps to the louder loudspeaker by the time the difference is about 7 dB. See Fig. 4.
So, time differences trump amplitude differences for our localization mechanism. Dang!
WHAT IT ALL MEANS
If you want to hear stereo or surround images in approximately their proper positions, you’ve got to stay in the sweet spot and/or on the median plane. See Fig. 5. Further, you need to have the loudspeakers aimed at the sweet spot to obtain reasonably high frequency response (unless you’re using my loudspeakers).
When we are working with mono components (say, the on-screen voice) in a stereo mix, the only place to hear that voice emit from the center of the screen is on the median plane. In multichannel, the situation is a little more complex. We can assign the on-screen voice to the center channel only, and it works fine. However, many viewers do not have a center channel, and then you have nuked the voice track—a definite badness!
I know, I know! It’s all supposed to be downmixed, but the number of letters I’ve received complaining about the problem suggests that it often doesn’t happen. An alternate solution is to blend the center channel info into the left-right pair as well, so it’s present as both a phantom image and a discrete signal. This solution isn’t perfect, but to my ears it works the best. However, when you are out of the sweet spot, that image drifts to the nearest—earliest—loudspeaker.
| Fig. 5: Where the phantom image appears for a listener on and off the median plane as the relative time at the two loudspeakers is varied, as a function of listener position.|
Another wrinkle with all of this has to do with the difference in timbre between a phantom image and a direct sound source. For other reasons that are also beyond the scope of this article, the phantom image has inevitable comb-filtering effects that are quite audible, compared to that direct signal. One signal or the other is going to be a little off, depending on how the mix session was monitored. Oh dear!
Why do I dwell on this? If you want to be able to hear this stuff, and optimize the audio quality of your work, you’ve simply got to listen on the median plane, and/or in the sweet spot. There is no other place where you can be and do this work well. And you’ve only got a couple of inches of slack.
Next month, we’ll consider the control room, the environment in which this all happens. In the meantime, thanks for listening.