Listening To Our Audio

Last month, I hinted at some of the things we can do to make our use of loudspeakers for audio monitoring more effective. As a result of some other work I’m doing, I’ve been thinking hard about the loudspeaker problem, a conundrum in which the loudspeaker is both most important and at the same time, just about inaudible. I thought it might be useful to share with you some of the basics of that work and then, in future columns, get down to some technical specifics of (a) how to set up your monitoring situation and (b) how to listen to it for fun and profit.

The term monitoring is audio-industry jargon for the idea of “observing, in a reasonably neutral way, the quality of the signal(s) that we are sending out to our customers.” It sounds pretty straightforward, put that way. Unfortunately, it isn’t, for a variety of reasons.

The problems begin to accrue as soon as we begin to ask why we would want to do such a thing. The obvious answer is that we would like to verify and ensure that the quality of the signal is both reasonably accurate, as well as acceptable and attractive to our beloved viewers. Let’s consider these vaguely incompatible characteristics in a little more depth.

We’d like our signals to be an accurate reproduction of what was captured at the microphone. The idea behind this is fairly obvious: we’d like to think that the more accurate our signal is, the more realistic and engaging it will be for our viewers. Such a viewpoint is central to the rationale of things like high fidelity and high definition.

In the monitoring world, I refer to this as “listening back.” We are observing our capture of a previous acoustic event in order to assess how realistic and close to the original it is.

We’d also like our signals to excite, thrill, please and otherwise fully satisfy our beloved end users, the viewers. To monitor our signals for the purpose of assessing the extent to which those signals will excite, thrill, please and otherwise fully satisfy our beloved end users, we need to know, and to some limited extent simulate, the conditions under which they will view and listen to our signals. It’s another level of accuracy; perhaps, how accurately our viewers will think we’ve captured the previous acoustic event.

I refer to this as “listening ahead.” To my way of thinking, it is probably the most important part of the monitoring activity, considerably more important than listening back.


Listening ahead runs counter to the very intuitive notion of accuracy, which is a cherished if poorly considered principle of reproduction. I’ve found the idea that we must have as much accuracy as possible is closely and dearly held by many of us in the business, and that we have real trouble letting go of it, except to admit that “we should also have crummy speakers so we can tell what it sounds like on all the crummy speakers out there!”

This latter assumes, of course, that all crummy speakers sound alike and sound equally crummy, which is an unfounded assumption. What I’ve found, in my mastering work, is that I need to check my work on a range of high-quality generic systems that represent such playback genres as stereo TV, 5.1 home theatre, boombox and automobile, without adding specific nasty artifacts or spectral aberrations to color the reproduction. This way, I’m able to head off any serious spectral or level problems inherent to the various genres, and to assess and balance out mix decisions to work on all of the systems, as well as on my reference monitor system.

Note there’s nothing about accuracy here—it’s all about getting the audio program to work, which means (in TV land) to sound natural, convincing and authentic. Whether or not it is accurate is more or less irrelevant.

At the same time, we have some powerful mojo working for us. This mojo is called “the willing suspension of disbelief.” The more I study loudspeakers, the more I am intrigued by this quality. We don’t hear loudspeakers. Instead, we hear the imaginary sound sources that the loudspeakers are representing at any given moment.

This is not due to accuracy—it is due to our wish to be engaged in the illusion. The term “willing suspension” doesn’t tell the half of it! We probably should call it “the absolutely determined refusal to perceive physical reality when there is an illusion to be had.”

If we were to directly compare, for instance, a loudspeaker and a piano, we would notice that they sound wildly different and that a loudspeaker doesn’t even begin to approach the sonic interest and beauty of a piano. However, when we listen to a recording of, say, Keith Jarrett, Oscar Peterson or Elton John, we have no trouble at all embracing the piano-ness of the recording, and we don’t notice the loudspeakers at all! As a result, audio engineers get away with a lot, an awful lot.

With that said, there are a variety of techniques we can use to make our work more effective, to massage that willing suspension of disbelief, and to make our loudspeakers that much more effective. Next month, I'll dig into that toolbox a little bit.