JJ? Who's JJ? JJ is James D. Johnston, one of the major players in codec development and psychoacoustic research, with a distinguished career at Bell Labs, AT&T Labs Research, Microsoft and, currently, at Neural Audio. Naturally, as such a researcher, he has investigated the phenomenon called loudness with a thoroughness, rigor and care that is way, way beyond what us general practitioners settle for in our limited quests for truth and understanding.
As I've repeatedly discussed in these columns, in broadcast television we have a real problem managing the level of sound, and in my simplistic journalistic way, I've referred to that problem as a problem with audio levels. My whining finally came to the attention of the good people at Neural Audio, and they put me in touch with JJ for some regrooving. As JJ promptly pointed out to me, loudness is not necessarily the same as audio levels, and this is particularly so in the extremely common case where we use signal processing, particularly, ah, compression.
|James D. Johnston, aka JJ |
So, campers, it is time to reboot, and think about this a little more carefully, JJ's way.
THE NATURE OF LOUDNESS
Loudness is a subjective sensation related to, but not the same as, the magnitude of sound pressure level at our ears. We perceive loudness in a way somewhat similar to our perception of brightness, or intensity of touch, or strength of odor. Clearly, it is a primary tool for survival, and deeply hardwired into our preconscious sensory mechanisms.
Loudness gives us an aural sense of the approximate size, the proximity, and the power of a sound source. Such sensations lead to an emotionally charged sense of relative safety or danger, as well as an unequivocal sense of the probability of our immediate survival. When we are exposed to extremely loud low-frequency sounds, we self-medicate with adrenalin, in anticipation of a struggle and/or an extremely rapid vacation from our current location.
Because of these functions, changes in loudness are an essential emotional ingredient in sound and music, and one of the most powerful energizers for our kinesthetic responses (such as dancing) to sound and music. The effective management of loudness is powerful, powerful medicine in music, movies and audio in general. We ignore it at our peril.
LOUDNESS—OUR HEARING AND ACOUSTICAL PARAMETERS
With that said, there are several physical elements that, in combination, affect our sense of how loud a sound is, but first we've got to reconsider the relationship between our beloved audio level fader, decibels and loudness.
|Fig. 1a: The audible spectrum seen in 1/3r octave bands (the vertical green bars). Courtesy of iZotope |
"Up is louder," we like to say, as an audio truism. And so long as no other changes happen to a given signal, when we tweak an audio potentiometer we will notice a reasonably consistent and reliable change in our sense of the loudness of the sound that is controlled by that potentiometer, coming from a loudspeaker: as the amplitude of a given signal increases, so will its apparent loudness.
There are some informal rules of thumb for this, such as: +1 dB (100 percent to 112 percent change) is about the smallest change we can usually perceive, while +10 dB (100 percent to 316 percent change) is perceived as about "twice as loud," in a general sort of way.
But here's where it gets a little trickier. The above statements are only approximately true for a signal with a given spectrum and crest factor (which can roughly be thought of as the amount of energy occurring over time in a sound).
ENERGY, SPECTRUM AND TIME
There are, in fact, three different acoustical parameters that significantly affect our perception of loudness. We've already mentioned the acoustic energy of sound waves. (As it changes, our sense of loudness changes.) In addition, loudness is related to the bandwidth of the audio spectrum that is present in the sound. (Add some new parts to the spectrum and loudness will increase, even if amplitude remains the same.) Finally, there's the amount of time that energy is present for. Very short impulses don't sound nearly as loud as sustained sounds with the same amplitude. (This is, of course, closely related to crest factor.)
The spectrum part of this is quite interesting. Our hearing spectrum (as detected in our inner ear, the cochlea) is divided up by frequency with filter bandwidths called critical bands (approximately 30 of them). They tend to bunch up in the middle of the spectrum. See Fig 1. JJ points out that this is somewhat of an oversimplification, but it will do for our purposes.
|Fig. 1b: Another audible spectrum seen in critical bands (the vertical green bars). Courtesy of iZotope |
What is important about them is that when energy increases in a single band, it follows the rule that "10 dB increase sounds twice as loud." However, if the same amount of energy shows up in two of those critical bands (an increase of only 3 dB), that combined sound will also be perceived as "twice as loud." This is why compression can be so deadly—we increase the levels of all of the softer critical bands while limiting the louder ones. Even though the overall level doesn't go up very much, the perceived loudness sure does, a lot!
Interestingly, this is all quite nonlinear, so that no simple algorithm is going to effectively predict or control the level of a signal in terms of its predicted or perceived loudness. We may be able to squash levels with a compressor, but actual perceived loudness will be a good bit harder to manage, except at the extremes of way too loud and way too soft.
THE NATURE OF OUR POPULATION
Another point needs to be kept in mind when considering these issues. There is considerable variance in the way that individuals hear, and what various individuals actually perceive. This is particularly true for us older dudes and dudettes. Hearing deteriorates with age, and it does so in a variety of ways, with a variety of differing impacts on perception. There is also the possibility that the extreme exposure to music at high levels over headphones (compared with 30 or more years ago) may also be damaging the hearing of younger people as well.
What this all means is that our various engineering simplifications introduce significant and variable loudness errors and irritations for different viewers. Yikes! Background noise makes this even worse, by masking some soft sounds, and giving them zero loudness.
So, there is no bestest. No way we can manage levels sufficiently satisfactorily for all of our listeners all the time, so that none of them are irritated by the loudness or lack thereof in our beloved broadcast signals. What we can do, however, is get it close, and there are a variety of strategies for doing that. We need to more fully understand loudness and its relation to spectrum and time, and more carefully manage it.
Unfortunately, rigor and engineering excellence are called for at a variety of points in the production and transmission paths, and we are not achieving that. We'll talk more about that next month. In the meantime, feel free to check out some tutorials JJ has done about this at www.aes.org/sections/pnw/ppt.htm.
Thanks for listening.