Last month, I began a discussion of the sensation of loudness inspired by James Johnston (JJ) of Neural Audio. This came about because of the ongoing problem that we, as an industry, have with maintaining consistent loudness in our broadcasting efforts. We noted that loudness, as a sensation, is affected by the level, the spectrum and the crest factor of the audio signal. This means that simple amplitude level adjustments (i.e., potentiometers and compressors) are only going to have a limited usefulness in controlling and maintaining loudness. This month I'd like to probe a little more deeply into the nature of that problem.
Intuitively, it seems reasonable to wish for a stable and predictable loudness, so that any time we select a program to enjoy, the loudness is (a) reasonable and (b) approximately the same, regardless of the program, the time of day, the channel or any other external variable. However, when we select, say, a battle scene, we do expect greater loudness than during an, er, intimate moment, and we do not want these two scenes to be at the same loudness.
James D. Johnston, aka JJ Dolby Labs proposed an elegant approach to this (Dialnorm) by trying to calibrate amplitude to the subjective loudness of speech. Dialnorm, if done correctly, yields a consistent audio level (if not loudness) for all dialogue, and assumes that other levels in any given program will relate meaningfully to that spoken level. Dialnorm, however, does not help with adjusting the audio to the listener's specific conditions.
As I also noted last month, there is no reasonable way to get loudness to be entirely consistent, if only because of variability between the way listeners perceive sound and the backgrounds in which they listen. So our goal has to be to just get close. And this leads us back to JJ and his work.
JJ'S LOUDNESS WINDOW
If JJ were king (he says), he would establish two loudness windows, one established in the post-production suite and the other in the television or set-top box under the control of the end user. Each of these would have two controls, to be set subjectively by the producer and end user, respectively. We'll consider the end user first.
JJ would ask the viewer to set two controls. One would establish the "loudest" level that the end user would like during any given viewing session. The other control would establish the "softest" level that the viewer would like, which I assume would be intuitively established somewhere slightly above the noise floor of the viewing environment. These two loudnesses would adjust a loudness-based compression algorithm in the playback device according to the listener's requirements. The end user viewer would define, for any time and environment, the thresholds for "too loud" and "too soft." The playback audio, however it was produced, would then "fit" into that window, based on a loudness model (see the Secret Sauce section, below).
IN POST PRODUCTION
Earlier, in post production, the producer, also using two controls, would establish a nominal "dialogue loudness" and "peak loudness," so that there would be a defined range between the desired loudness of dialogue and the desired maximum loudness. That range would set the size of the "production audio loudness window." I assume that amplitude levels below the actual dialogue amplitude would not be treated as a function of production, so that we could normally expect low loudnesses to be reasonably natural and predictable. With the ability of the listeners to adjust their playback in response to the listening conditions, the producer will be free to use lower, natural levels at will, knowing that the problem will be rectified by the listeners according to what they require. I am not quite sure how we would treat ENG clips with serious limiting and pumping of the background sound.
One very nice idea that JJ shared was the concept that peak loudness would have 6 dB of headroom (i.e., peak loudness would result in signals at or below –6 dBFS in intensity), to head off the entirely unnecessary crushed hyperlevels that permeate so much audio material today. Given that we are dealing with loudness here, not levels, such a standard becomes much more viable.
THE SECRET SAUCE
What goes on within these windows is not clear. For a variety of perfectly valid reasons, JJ declined to discuss the processes he might use to modulate amplitudes to obtain a viable change in loudness between "soft" and "loud" for a variety of contexts. Keeping in mind that loudness is dependent in a general way upon a variety of parameters (the amount of spectrum present in the signal, the crest factor of the signal and the amplitudes of the various critical bands present in the signal), we can expect a complex multiband treatment of the signal, hopefully leading to a reliable range of loudnesses that is very similar to the range of loudnesses we perceive naturally. I also assume that there might be some accounting for the changing spectral nature of hearing as a function of level (the Equal Loudness Contours).
I'm really interested in the post-production establishment of a dialogue level. Is that sent as metadata to the end user's set-top box? If so, it could be a threshold point for defining "softer" sounds and "louder" sounds, with attendant changes in the behavior of the loudness model above and below that threshold at the set-top box.
I'm also curious about what would happen if the end user were to set the "soft limit" at a setting equal to or greater than the "loud limit." In fact, there would be times and environments where we would like to have these levels very close to each other (watching TV on an iPhone in a noisy environment, for instance), and I like the idea of leaving this up to the end user.
THE END RESULT
What's really nice about JJ's view is that it avoids the whole issue of amplitude levels and their attendant confusions (the Loudness Wars). It also makes fewer assumptions about program material and listeners' preferences and needs than other techniques I've run across. On the down side, it does appear to depend, at least a little, on the good will of the producers, and it assumes that they would not abuse the system by using hypercompression and elevated levels. Finally, it can give the producers some assurance that what they intended actually gets conveyed to the listener in a reasonable fashion.
The end result would be a big improvement over what we have now. The establishment of an adjustable "end-user window" would solve a multitude of problems that to date have proved to be intractable. And once the listener has set that window and gotten comfortable with it, no further changes would be necessary (except possibly for dimming or muting for the inevitable phone interruption). Levels could even be locked, striking a blow for parents everywhere!
And because we are more directly adjusting the signal for the sensation of loudness, we are going to get a lot more bang for the buck with our limited dynamic range. That would be a big deal!
Thanks for listening.
Dave Moulton is often way too loud, even though he’s really a softie at heart. You can complain to him about anything at his Web site, www.moultonlabs.com.