Codecs in Review

You all know what codecs are, right? Chunks of software that encode digital data for transmission and/or storage and then decode it for use. It's an encode/decode cycle. We call it codec (for code/decode), in an oddly poetic compression of language.

But why do we encode and decode? Why don't we just hit the transmit or record button? You know the answer to this one too, right?

That answer is, of course: we got more data than we can transmit! So we gotta throw some away. And this is what the codec does. The code/decode cycle contains an algorithm that deletes data that the codec's programmers believe won't be important to our enjoyment of the transmitted signal -- sort of like Readers' Digest Condensed Books.

The reality is that, for audio signals, the conversion from analog to digital requires a lot of digital data. The standard stereo CD signal uses 1.41 MB of data every second, which is, ah, huge. Takes a lot of bandwidth for transmitting; bandwidth that we don't have. Similar constraints exist for video signals as well.

So, the codec encodes the signal, reducing its bandwidth by throwing away a great deal of the signal -- hopefully all of it consisting of irrelevant details -- prior to transmission. After transmission has been accomplished, the reduced signal is reconstituted into a reduced-data approximation of the original during the decode portion of the cycle. Naturally, the devil is in the details.

In fact, the devil permeates a fair amount of this. Back in the late 70s, we went to considerable trouble to define and establish a viable digital signal standard (the Red Book CD specification for digital audio is 44,100 16-bit voltage samples per second per channel), and now, ironically, we're busy working on trying to whittle away at that estimable standard in order to be able to transmit this signal via radio (and television) in its digital format, even as we also develop higher resolution formats. Sounds a little like a deal with the devil to me!

WHY IS THIS IMPORTANT?

The general adoption of such data compression is really worth thinking about, because it is shaping both the present and the future of audio (and video). Currently, all Internet transmission of audio and video and all satellite transmission of digital audio and video are compressed by codecs. Further, as digital FM comes online over the next few years (it is already available via satellite and terrestrial IBOC transmission is beginning now), it is all compressed by codecs. We are profoundly committed to this technology and broadcast media. By 2010 it is going to be everywhere!

Taken by itself as a concept, this sort of lossy data compression is not a bad idea at all. However, concern arises because we've managed to paint ourselves into a bit of a corner with it, due to some really enthusiastic general optimism (Oh, the channels we'll have! The pictures we'll see!!). Because we can compress, we do. More to the point, we compress as much as we can, for better or for worse. We have now committed ourselves to transmitting a volume of data (er, number of channels) that requires quite massive compression. We didn't stop at the quite conservative point where we felt that quality might begin to fall off, but rather we've kept going to the point where we know that it has begun to fall off, but still hope nobody will mind too much. We've been trying to fit more and more channels into a fixed bandwidth, simply to maintain, in each of our various competing corporate business plans, a competitive offering to the public.

Why did we do this? For survival in a free market, that's why! We could not and can-not afford not to!

Meanwhile, there are some verities that need to be kept in mind. First, data compression is not like audio compression. We aren't compressing amplitude here, knowing that we can expand it later with no significant loss of resolution. Data compression might, more accurately, be called "data-stripping." We are deleting data, and it cannot reasonably be recovered or restored later. It's gone for good, for better and for worse.

Second, we are stripping a lot of data away, usually more than 90 percent of it! This is not a trivial act of compression akin to a 1.3:1 compression above +10 dBm in audio. This is massive.

Finally, once we've "decoded" the stripped data, we may encounter some significant problems if we try to encode it again for more storage or transmission. Interesting, eh?

Now the argument is, of course, that the stuff we're stripping out is inaudible, or at least insignificant. And, in fact, there is a lot of truth to that argument. This sort of lousy compression can really work well, and it yields quite massive benefits for both the broadcasters and the consumers.

But it is essential that we keep in mind that we've created a fairly fragile low-resolution signal. We've done this as a function of our effort to fit more channels into a given bandwidth. We need to be realistic about what that low-resolution signal is, how it behaves and how it will be perceived by our beloved listeners and viewers.

KVETCHING AND HIGH PRAISE

And here's where we maybe got a little too chummy with the devil. We've made some, ah, claims about our codecs. We've said things like, "Well, you know, it's nearly CD quality. You can't really hear the difference." And we've used such claims to justify our adoption of codecs. Unfortunately, we've institutionalized such claims.

Meanwhile, we've tended to downplay just exactly how much data we're throwing out, as well as how much progress we're really making with improving codecs. To do so would imply that maybe they weren't so hot to begin with, while we were busy claiming that they were, well, ah, nearly like CDs.

I've got some history here. Off and on, I've been hired by a codec developer to conduct formal and informal listening tests and demonstrations of codec audio performance. Right now, I'm doing work for iBiquity Digital Corp., who in turn is providing the codecs used by both Sirius Satellite Radio and the upcoming IBOC terrestrial digital FM broadcasting.

Some years back, I measured some stereo codecs running at 128 and 96 kbps that performed quite well, in addition to some other codecs whose performance ranged from fair to good. By "well," I mean that in blind trials expert listeners scored them somewhere between "inaudible" and "audible but not annoying" and naove listeners generally scored them higher than that.

At that time, I also took an informal listen to codecs running at lower bit-rates, such as 64 and 36 kbps. At that time, I opined that such rates were "too slow to be musically viable, though speech intelligibility is certainly adequate." That was a polite way of saying that codecs at that speed really didn't sound very good at all, certainly not for music. Recently, iBiquity asked me to listen to its current Perceptual Audio Coder Version 4, running at 96, 64, 48 and 36 kbps. At the faster end of that range the performance is remarkably good, generally better than the best codecs I studied five years ago. At the slower end of things, where things used to be impossible from a musical standpoint, the performance is now musically viable, even if the artifacts are audible.

It needs to be noted that most listeners will detect some artifacts and some listeners will almost certainly find those artifacts to be annoying, and there is no sense in trying to gloss over those truths. In fact, it is the previous attempts to do just such glossing that has gotten us into trouble. But, at the same time there is a quite positive truth to be noted as well. At 96 kbps, we are actually deleting 93 percent of the data, and the result is a signal that even expert listeners find generally acceptable, whose defects are often inaudible. Remarkable! There's high praise due here for a remarkable improvement and technical accomplishment.

But what's even better, to my mind, is what has been accomplished at the lower bit-rates. At 36 kbps, we have deleted 97.5 percent (!) of the data (and of course we can hear the effects of that compression). But, just as significantly, the signal has remained musically viable, which is to say that the elements of timbre, dynamics, stereo image and intelligibility remain viable and generally enjoyable. This means that, according to my lofty standards, such compressed signals are pleasant and satisfying for the kind of general-purpose background listening that is the basis for most listening to broadcast music, the kind of listening that occurs in a car, for instance. To me, that's a tremendous achievement, and a huge improvement over where we were back in 1997. Back then, I doubted that we could ever get down to these bit-rates successfully.

THE FUTURE AIN'T GONNA BE WHAT IT USED TO

The upshot of all this is that we are entering yet another iteration of the brave new world, a world in which low-resolution signals are everywhere and their signature artifacts are audible. But we can and will learn to live with them. And in return, we will get a richness of diversity in our broadcast programming that hasn't been available for quite a while now, if ever.

Meanwhile, this is another step along the path toward on-demand broadcast programming, where we can call up anything we want anytime we want anywhere we want, all for a reasonable cost. That's where this is all really heading.

Thanks for listening!