The Difference Between What You Hear and What You 'Hear'

Alert readers may recall that over the past several months I've been plumbing the mysteries of audibility, with a particular eye to the troubling confusions about how much resolution we can really perceive in an audio signal. Very alert readers may have noted that the issue is far from simple.

Among other problems, any reasonable experimental test method introduces errors of one sort or another; it's tough to define exactly what we mean by audible; and when we really get into listening to tiny differences in the audio ozone, both tiny and imaginary differences begin to loom awesomely large.

Unfortunately, that's not all, or even the worst of it.

When we consider the human auditory mechanism in any detail, it becomes clear in a hurry that the mechanism is highly complex, not well-understood and hard to observe in action. At the same time, it works stunningly well much of the time and its operation is so seamless and well-integrated with our conscious perception of the world via other senses that we have virtually no awareness of its operation ÷ and certainly no awareness of the complexities of that operation.

OUR EARS AS SIGNAL PROCESSORS

But operate the system does, and that operation is almost frighteningly complex. We'll talk about some simple stuff first.

For starters, there are not one but two compressors working in the outer and middle ear, with different time constants. Total gain reduction can be up to 40 dB. Yet we are hardly ever aware of those compressors working. Also, there's a dynamic multiband equalizer/compressor in the inner ear that dramatically varies the frequency response of our hearing as a function of loudness level. This behavior is described by the so-called Equal Loudness Contours.

This represents some heavy-duty signal-processing. We aren't talking one-quarter of dB here, half a dB there ÷ we're talking some major changes (like 60 Hz being rolled off 20 dB over a gain change of 40 dB). And we barely notice it!

This should give us some pause. When we moan and groan about how putting some grungy analog signal processing in the signal chain makes it unreasonably difficult to perceive the pristine wonders of a 24-bit 96 kHz audio signal, we may be right; but we're also failing to note the obvious ÷ tons of such grungy signal processing is going on in our ears all the time!

How in the world can we expect to pull out the 17th bit in a reverb trail (much less the 24th bit) when our two compressors and dynamic non-linear multiband analog EQ are busy pumping away only a few dB down? It just isn't reasonable. Our production experience with signal processing tells us that this just can't work.

But our ears do, somehow, seem to be able to perceive a remarkable range of stuff. But what we need to keep in mind is this (I'm going to call this Moulton's Second Law, in a fit of post-millennial pomp): What we hear ain't what came in our ears!

Uh-oh!

THE JOYS OF INTEGRATION

Set aside the grungy analog neuro-mechanical compressors and EQ in the outer, middle and inner ears for a moment. We need to face up to the real problem, which is time ÷ the integration and organization of auditory information over time, to be specific.

For starters, there's approximately a 7-millisecond delay between the time a sound impulse reaches the eardrum and the time the volley of neural impulses that represent that sound reach the auditory cortex. Meanwhile, we can think of the auditory nerve that runs from the inner ear to the auditory cortex (the part of the brain that manages auditory information before it is sent to the central nervous system for action, and to the frontal lobes for perception) as a kind of interactive and iterative buffer memory that processes neural impulses over about a 50-millisecond period.

It also seems to organize them into coherent neural patterns that represent the sound over periods that comprise about 30 milliseconds on an ongoing basis (talk about time smear) and to sort them into phase-locked families of impulses as well. This integration is one of the basic building blocks for the Precedence, or Haas, Effect (you know, the effect that integrates early reflections with the direct sound and helps us localize sound sources).

And, as if that integration weren't enough, at the auditory cortex we integrate the two independent neural feeds from the two ears into a holistic spatial construct! Then, and only then, does our auditory system begin to present that information to the conscious mind. This takes time, and so, by the time we actually perceive the sound, it's already been over for a while.

Further, we never got to perceive the raw information ÷ all we ever hear is a glossed-over, edited, EQed, compressed, and time-smoothed representation of that raw sound. We never do get to hear the original!

WHAT DOES IT ALL MEAN?

So, when we ponder the issue of audibility, it is useful to keep in mind that there is this huge, really hairy blob of signal processing inescapably built into the system. We generate a sound and ask listeners to report what they perceive. The only reasonable view of this is that such reported perceptions are highly filtered by said signal processing and integration.

From the viewpoint of researchers such as me, test listeners are complex and variable "perceptual filters." Their reports have comparatively little to do with the physical stimulus in the air that we supplied them.

It should make you humble. We have a remarkable auditory mechanism that is capable of discriminating stuff over huge frequency, amplitude and time ranges (1,000:1, 1,000,000:1 and 100,000:1, respectively).

PIECE OF WORK

Further, we extract all kinds of information from the detected sounds ÷ identity, distance, size, material composition, point-in-space of each of multiple simultaneously sounding objects, as well as the size, physical composition and location of the surrounding environment in addition to our position in that environment ÷ all the while working within a maelstrom of neuro-mechanical signal processing, extraction, detection and feedback. It's a helluva piece of work!

It also almost negates the question of audibility. A lot of stuff is detected at the eardrum that we never perceive. Do we hear it or not? Hmmmmmm.

Meanwhile, lots of stuff that we perceive was never detected at the eardrum, but instead inferred. Do we hear it or not? Ommmmmmmm.

Because of the complexity of this highly dynamic and variable perceptual filtering, such questions become close to meaningless at the limits of our perceptual envelope. Ozone, reality and imagination merge. This is a large part of why we get such confusing, confounding and contradictory answers to our high-resolution audibility questions. Put simply, there is no simple answer. Whew!

Thanks for listening.