The 'Aesthetics' of Audio Compression

I recently received a letter from Tom Lento, a broadcast technology marketing consultant, commenting on my March 5 column about the variance of audio levels in TV programming. Tom writes:

"I'd be interested in your view on the 'aesthetics' of compression -- if there is such a thing. Let's take an extreme case, as such examples really amplify (if you'll excuse the expression) fallacies in audio practice. I refer to broadcasts of live, unamplified music with wide dynamic range.

"Take 'Live from Lincoln Center.' I'm not sure how much compression is used in the original feed from New York's WNET Channel 13, but it usually sounds OK on the station's broadcast. However, when it's rebroadcast over our local PBS station, it becomes unlistenable on a good audio system, and noticeably bad even through the TV's puny built-in speakers.

"Any time the orchestral level subsides below middle levels, the compression system quickly drags up ambient noise, coughs, program rustling and electrical hum from soft background into glaring prominence. One blast of a horn or stroke of a tympani kicks the overall level back down for a few seconds, and the whole cycle starts again. The technology is being used in a way not suitable to program content.

"I have two questions about this:

"What is an appropriate amount of compression, not just for an extreme case like this, but for other audio material in which listeners would expect dynamics to vary in the real world?

"Is there any automated way to make sure such special fare gets the more relaxed treatment, while standard programming receives the more stringent volume leveling discussed in your column?

"I know that few people in our modern world have much experience with truly acoustic music, but that doesn't excuse audio engineers from making sure it gets a fair hearing. It's ironic that the very cultural fare that might encourage upscale viewers to turn on their home theater sound systems would wind up sounding awful when they did so."

Now, Tom really describes the negative effects of audio compression well. He also correctly identifies the problem encountered by high-res listeners listening to heavily compressed audio. He then would like to know, exactly what is an, ah, appropriate amount of compression. Good question!!!

From my standpoint, the problem can be fairly simply described: We're trying to fit a quart of sound into a pint pot, and we'd like the pint-sized result to sound like a quart! The need (compressing the quart) is sweetly reasonable, but the wish (that it shouldn't "sound" compressed) is irrational, however desirable.

Audio compression originally evolved to keep signal levels below some mandated broadcast power and amplitude level. Although it purports to regulate the amplitude (and therefore, in theory, loudness) of a signal, compression can also dramatically affect the timbre, spectrum and acoustical quality inherent in a signal. Even more important, among all audio processing types, it is the family of processing that is hardest to "hear out," in the sense of objectively recognizing the compression effect, until it is truly gross. Nonetheless, compression's effects can be quite audible, if not easily identified or described, even in fairly subtle applications.

NO SINGLE ANSWER

This makes the question of "how much compression is appropriate" a really tough one. There is no single answer, of course. This is why devices such as the TC Electronics Level Pilot that I mentioned in that column are so complex in their behavior. But from the "aesthetics" standpoint that Tom mentions, I think the answer can be stated pretty simply as: "Appropriate broadcast compression should manage to maintain the audio level within a necessarily constrained amplitude window while making no audibly significant changes to the timbre, spectrum, reverb time or room tone of the signal." The keywords are, of course, "audibly significant changes."

In any case, such processing requires a deft touch and a complex, multifaceted awareness of timbre, spectrum, reverberance, ambience and room tone. Such processing also needs to be dynamic, changing over time as a function of program material and audio needs. One size definitely does not fit all, in this case!

Finally, such processing needs to be able to recognize audio behaviors, including transient peaks and noise floors, plus the signal characteristics of such signal types as dialogue, FX and music content, so that its behavior can be altered to cope with these things and their various dynamic range. This speaks to the "automated" part of Tom's question. A broadcast facility cannot afford to have a skilled audio mix engineer on line to make the skilled, deft tweaks needed to accomplish what we might want to call "intelligent, dynamic, sensitive broadcast compression." To get the results that Tom wants and quite reasonably expects for a high-res system, we need an automated box to do the online dynamic work.

No simple compressor can do this job, and we do ourselves and our viewers a disservice by assuming it can. Metadata holds out the promise that multiple simple solutions can be simultaneously implemented at the various viewers' receivers, in accordance with their individual needs. A box like the Level Pilot, reduced to a chip and with a drastically simplified control system, might also do wonders installed in receivers and cars. However, such solutions assume that producers will no longer feel the need to compress or process the program at the point of origin or during transmission, and that all processing happens at the viewer's site. This won't happen for a long time, if ever.

THE POINT ON THE SCALE

So, this is where the idea of the "aesthetics" of compression comes in. In subjective testing, I often use a five-point scale to evaluate the effect of some processing versus the original signal. If the processing is inaudible, we give it a "5." If it is "audible, but not annoying," we give it a "4." If it is "audible but slightly annoying," we give it a "3," and so on.

The effect of audio compression is almost always audible, so we can't ever expect our compression to yield a score of "5" on such a scale. However, it would be nice if we could keep it close to "4" (not annoying), while squeezing the signal into its limited dynamic range. And never, ever, should the quality of compression slip below a "3" (slightly annoying) for timbre, spectrum, reverberance, room tone or noise.

To accomplish this takes some good engineering, some careful attention to detail and some real knowledge of audio, along with some really good equipment. It can be done, and it probably can be automated. I doubt, however, that it will ever become "plug 'n play;" there are too many variables. So, the challenge still will lie with us audio engineers, in our quest to reduce the amount of "bad audio" going out and to improve the amount of "good audio."

Thanks for listening.

Recommended reading