Adaptive, automated loudness control

Today, we live in a world of automation. You see it in our industry, in our daily working lives and even at home. Everywhere you look, there is technology that is designed to make things happen faster by replacing human effort with machines.

The broadcast industry is no exception to this rule. Automation is everywhere, and the trend to speed things up with technology is apparent, especially now that the Internet has given us new ways to disseminate audio and video. The traditional technologies that we used to rely on for production and distribution are changing to encompass this new world and this, in turn, is further reducing the influence of human control. This begs the question: Will these changes alter our experience of moving pictures and audio? Should we preserve the human control of audio signals in such a highly automated broadcasting world?

Impressions of a broadcast facility

Whenever I imagine a totally automated television broadcast facility, two things immediately spring to mind. The first is what I call the “fleece jacket syndrome.” No matter how warm it is outside, everyone inside is wearing a fleece jacket or a sweater because the air has been cooled down to such an extent that it feels like a permanent winter. No wonder broadcasters are so keen to grab one of those fleece jackets whenever a manufacturer is giving them away!

The second image I have is of the control room — a huge place with dimmed lights and pictures flickering away on massive video walls. No matter where you are in the world, these control rooms all look the same — incoming channels, transmission channels, leased lines, etc. In front of this, there is inevitably a large desk with a number of control panels and intercoms, as well as computer screens showing program lists, scheduling information, the condition of the transport streams and perhaps the satellite uplink configuration. Everything in there needs to be controlled, but where are all the people? Usually there are very few of them. The image of any individual channel on the multiscreen display gives the channel name and visual information about the video content. There may also be some bargraph-like visualization that tells you something about the running audio. The engineers who are present are surveying the operation of all the different technical gear involved in the process. There might be a station for quality monitoring of the audio and video, but you don't find this in every control room. This is what a totally automated broadcasting transmission center looks like. This image is, in fact, the reality for broadcasting centers, turn around facilities, uplink teleports, and cable and IPTV headends everywhere in the world.

If you look around one of these facilities and try to find an audio fader or a video switcher, you will be disappointed because there aren't any. Nor is there anyone there to control a switcher or fader. In modern facilities, staff are managing data that are being stored on massive servers with the unbelievable hard drive storage of many terabytes. Programmed schedulers are running the playout for hundreds of TV channels. The monitoring that is being carried out is primarily to control the basic condition of the running video and audio. There isn't any quality monitoring in place.

In such an environment, how does one ensure that certain quality standards for audio and video are being maintained for each dedicated transmission or distribution network? This is a delicate issue because, in an automated facility, there is a need to achieve optimum results while keeping human effort to a minimum.

In terms of picture content, this isn't too difficult because throughout the production chain, the video signal is subject to many dedicated controls in order to keep it within legal transmission requirements. Video format conversion is commonplace, and various levels of performance are available. Video is data that is delivered in a compressed format for the digital domain, and all those compression devices are performing a number of video filters and algorithms to guarantee the best possible signal quality within an acceptable bandwidth for the transport stream.

However, when it comes to audio, things are different. Of course, there is technology on the market that is designed to convert the audio into the format required by the transport stream and to meet the technical specifications needed to guarantee the best sonic performance. But that is where the similarity stops because, with audio, there is no common overall technical specification that is designed to check or legalize the content.

Audio engineers do have the benefit of some technical recommendations, but sometimes this doesn't solve the problem. For example, if audio is coded into the digital domain, the highest possible value in level should be 0dBFS. But that basic recommendation doesn't really reflect the wide variety of different, practical ways in which one can deal with digital audio. Take, for instance, CD audio. At 16-bit audio resolution, CD audio uses almost all of the available coding space (theoretically 96dB system dynamic). This means that if a broadcaster is getting music from a CD, it will be controlled and mastered to reach 0dBFS as its maximum value. That's very different from a typical broadcast signal, which now uses -18dBFS or -20dBFS as its alignment level in order to keep it in line with the recommendations issued by international regulatory authorities such as the ITU, the EBU and ATSC. What this means for broadcasters is that audio content coming from different sources can have very different level conditions. Content from CD audio can be more than twice as loud as audio coming from a standard TV broadcast.

Differences in program loudness are painful

As an industry, one thing we are all realizing is that the level of audio sources used in broadcast transmission can vary wildly. Of course, nothing is transmitted that hasn't already been processed and quality-controlled, but we still have a number of issues to contend with relating to audio levels and audio control.

Over the last 10 years, broadcasters have begun to understand that technical-oriented level control doesn't necessarily solve their problems when it comes to delivering better quality audio transmission. Audio loudness is now a hot topic, and there have been many articles published that discuss this issue and give the background to it. In simple terms, all audio sources that have been processed to control loudness should deliver the same overall loudness impression. And proper loudness control is definitely improving the quality of audio in digital broadcasting systems.

But what remains an issue is how loudness control is applied in today's world of automated broadcasting. Broadcasters have no choice but to use integrated or external audio processing to perform this control. I can already hear the complaints that some people, most notably skilled audio engineers, will make in response to that statement. I know they will be asking how automated online loudness control can be pleasant to the ears and be done in a way that isn't detrimental to the audio.

I can understand their concerns, but in a world where automation is king, there is no other choice because broadcasters are not likely to install an audio booth with proper monitoring and fader control where someone can sit and perform the task manually. We have to recognize that some kind of automated audio control is required if broadcasters are to comply with the new loudness standards and recommendations and maintain the highest level of quality for their audiences.

Given that there is no other choice, all that remains to be discussed is what characteristics this online automated loudness control system should have.

Adaptive audio control

Unlike an ordinary line amp, the gain of a loudness control processor is not constant; it varies with time depending on the specific control algorithm of the loudness processor and the changing loudness and amplitude of the input signal. These variations in the gain, which represent the real control process, should take place without any bothersome side effects to the audio signal itself, effects such as pumping, signal distortion, sound coloration or noise modulation. In other words, they should be inaudible.

The setting of the attack time parameter of a loudness control element affects how the unit will react to rapid amplitude changes in the audio signal. A long attack time can lead to overshoots (and consequent distortion) because the system is not fast enough to reduce the gain. A short attack time minimizes the chance of overshoots, but the more rapid gain changes in such cases have audible side effects such as clicks and other modulation artifacts.

Single-band and multiband designs are being used to develop hardware and software control units and, depending on the architecture, both designs are capable of doing a perfectly good job. One advantage offered by a single-band design is that it doesn't touch the sound as there are no filters involved.

The question is: How can one guarantee the best possible performance if no one has the opportunity to realign the circuits while they are being used? In the absence of a human, what is needed is a system that is adaptive and performs like a human. Such an algorithm needs to change its behavior based on the resources available, and for this you need a multiloop principle.

I suggest the use of a multiloop approach because we believe this is the best solution. The various loops each work over the entire frequency spectrum (wired band). They work in parallel, each with a different set of attack and release parameters. Each loop develops a control signal, which is then summed with the controls from the other loops to produce a single gain control signal applied to one final gain control element.

The way of summing the individual control signals requires unique technology. The digital implementation of the multiloop design permits a short delay time to be introduced into the audio signal path. This allows the gain changing elements to “look ahead” and determine the correction needed before applying it to the delayed signal just in time to control even the fastest transients. This is particularly important for the limiter, which provides a precisely leveled output signal absolutely free of overshoots (clipping). (See Figure 1.)

The proprietary algorithms used in the multiloop design also allow the automatic adjustment of the attack and release times according to the evolution of the input signal over time. This is called adaptive loudness control. By analyzing the incoming audio, the system can set relatively long attack times during steady-state signal conditions but short attack times when there are impulsive transients.

Conclusion

Despite the views of some critics, adaptive multiloop technology does provide a way to integrate audio control systems into fully automated broadcasting systems. It is obvious that a file-based control philosophy will never cover 100 percent of all broadcast transmission content — not least because television still includes live content, and even when tapeless technologies are employed, this content is still live. In a typical broadcasting facility, we will continue to have a mix of file-based and stream-based content control in order to guarantee proper output signal condition anytime and independent on the source.

Regarding the tasks of the engineers in an automated multichannel control room environment, any kind of signal processing must be a set-and-forget solution because no one has the time or the resources to use a human for this task, especially not for audio. In this scenario, the only way to create “automated” and “quality” real-time online processing algorithms is to use adaptive technologies. Provided they meet the requirements of all the current different standards and recommendations, these adaptive technologies will always guarantee the best audio control performance in a fully automated system.

Peter Poers is managing director of Jünger Audio.