The Importance of Timbre in Assessing Audio Quality

Timbre is an interesting audio characteristic. In music, it is the quality of the sound that allows us to differentiate between instruments. JOHNSTON, IOWA—Have you ever wondered what makes audio so difficult for television? Think about it. It literally took an act of Congress to get the television industry to focus on the problem of loud commercials.

This certainly wasn’t a new problem. I remember fielding calls from irate viewers while working for an NBC affiliate in 1982. Our audio certainly wasn’t digital, it wasn’t even stereo at that point and yet there were frequent complaints about the commercials being too loud. In my years prior to that, I worked in radio and remember tweaking audio processors to make my AM station the loudest one on the dial at the request of the program director. The goal was to jump out at the listener as they rotated the manual tuning control of their AM radio as they were driving down the road looking for something to listen to.

‘RHYTHMIC JACKHAMMERS’
In the very early 1980’s I worked for an AM/FM combo in the San Bernardino/Riverside, Calif. market, KFXM/KDUO and had what I believe was a prototype AM processor made by Gregg Labs (Greg Ogonowski) which was, in my opinion, one of the most flexible and amazing audio processors I ever worked with. If the program director wanted to park the modulation meter at 98 percent all the time, the Gregg Labs processor at KFXM could do that with the least amount of offense to the listener.

In audio, our focus tends to be on the loudness and intensity of sound. In my days at KFXM I got pretty good at making sure that when we modulated the transmitter, the negative peaks never exceeded 98 percent (loudness) and that when we were playing a song the audio stayed at 98 percent for the duration of the record (intensity). While this attracted the attention of the listener in that it got them to stop on our station, it also had a negative effect of fatiguing the listener since there was no longer any dynamic range in the music. We essentially created “rhythmic jackhammers” that attracted attention and then drove away the very people we attracted.

While this was a problem in radio, I think we benefited from the fact that while the audience was listening, their visual senses were focused on something unrelated… hopefully their driving. In television, the typical viewer hopefully has their senses of hearing and sight focused on a common element: the program they are watching. So the audience has an expectation of correlation between the two and if that is violated, they notice. Sometimes the violation is part of the program—the sudden slamming of the door or a scream of terror from out of the dark that is designed by the program creator to elicit a response. Sometimes the violation is used car dealer screaming to “Come on down to Widetrack Town!” which also elicits a response, although not typically favorable. But now all of this has been corrected thanks to the CALM act and the proper application of ATSC Recommended Practice A/85 which is available for free download from the ATSC website, www.atsc.org.

TIMBRE VS. TONE
So now that there is a spotlight focused on television audio, what are some of the other qualities of audio we should be thinking about as we move forward? I would invite you to consider “timbre,” (pronounced TAM-ber). Timbre is an interesting audio characteristic. In music, it is the quality of the sound that allows us to differentiate between instruments. If I play a 440 Hz note (A) on a guitar and play the same 440 Hz note (A) on a piano, they sound different. The timbre of each instrument is very different and it distinguishes them from each other. With a little practice almost anyone can pick out specific instruments playing the same notes from a mix assuming it is a small group of distinctly different instruments. In the above example, if I were to hook up a 440 Hz notch filter to an audio console and switch it in both the piano and guitar, notes would vanish. And yet if I were able to access that same filtering process within my brain, I could filter out the piano and pass the guitar through, so clearly there is more to the processing of sound than frequency, loudness and intensity.

When considering timbre, it is important not to confuse it with tone; tone is a function of pitch or frequency. In the old days we had tone controls that adjusted the bass and treble. These controls adjust the response of the audio system to a group of frequencies within a limited range. If we want to hear “bright” sound we increase the level of the higher frequencies, but it is all sound within that frequency range regardless of the sources. If I am listening to an orchestra and I ask for more high frequencies, I will get more high frequency from every instrument in the orchestra, whereas if I ask for more high end from the strings, I am asking for more sound of a specific timbre.

What this accomplishes may seem subtle but it is important because of the way the brain processes sound. It might also be possible in addressing timbre to allow digital audio processing to pick specific voices out over background sound and allow for better clarity of dialogue to improve the audiences’ overall quality of experience based on local conditions such as room acoustics and overall volume at the receive location.

Unlike with our vision, sound is initially received and processed in the very primitive areas of our brains and reaches our emotional centers. This is why people are frequently moved by certain pieces of music. It is also why people react violently when startled by a loud sound, known as the “fight or flight” response. I theorize it is also why they get so mad when a loud commercial startles them out of whatever state they are in when it happens.

As we move further down the road in the creation and distribution of digital content, I think it is critical that we divest ourselves of the preconceived notion that audio is a less important component than video. I would point out that for most of us, we spend eight hours a day with our video receiving senses shut down while our audio receiving sense functions 24x7. I think we also need to recognize that based on the way our brains process sound, we need to consider the more subtle and harder to define qualities of audio such a timbre when determining how much processing power we need to assign to the audio components and how much bandwidth and space we need to use for their associated transmission and storage.

Bill Hayes is the director of engineering for Iowa Public Television. He can be reached via TV Technology.