Processing audio

The broadcast industry may take audio almost for granted, but the routing, mixing and monitoring of audio are complex issues with no easy answer
Publish date:

However broadcasters decide to mix, monitor and route audio, the tools they need are available. Pictured: A Wheatstone TV-80 at WGCL-TV in Atlanta. Photo courtesy DST. (you have to love a society that never opens books) defines television as “the transmission of visual images of moving and stationary objects, generally with accompanying sound, as electromagnetic waves, and the reconversion of received waves into visual images.” The lack of emphasis on audio is pretty striking.

To be perfectly honest, our industry seems to take audio almost for granted, unless the sound is missing or unintelligible. From my days in station operations, I remember few complaints about the quality of the video but many about the quality of the audio. It seems interesting to note that we define the industry by the visual image (HDTV, SDTV, etc.) without including anything in the shorthand to indicate if audio is even present.

Today, we generally think of audio as a stereo pair, for broadcasters transmit most programming in a stereo format. However, there are many examples where that is not true; for instance, stereo is difficult to do well in news programming, and adding spatial imaging to the sound field would bring little improvement for the viewer. Of course, cable industry programming often comes with mono audio, and even more of the world is mono. Multiple-language programming is the common use for multiple channels of sound in many parts of the world.

Routing audio

This level of complexity is not unusual. Our company designed facilities for a major religious organization that handled more than 50 simultaneous translations. With the advent of DTV and surround sound mixing, it has become important to handle multiple channels of program audio plus, of course, Descriptive Video Service (DVS). So, how should one handle the routing, mixing and monitoring? The answer is as complex as the problem.

Though digital video has essentially replaced analog video in system designs these days, it is not as true with audio. The reason might be as fundamental as the fact that microphones are analog, or as simple as the view of audio as the stepchild of video, although a colleague in radio once noted that TV is “radio with a light to read by … .” Fortunately, hybrid analog/digital mixing consoles are reasonably priced these days.

Routing also has grown to offer conversion inside the routing switcher, with digital and analog audio signals freely routed to analog and digital outputs in some designs. These routers offer the ability to mix AES digital pairs in a flat matrix that enables broadcasters to map any mono signal from any AES stereo pair into any output. They also allow broadcasters to perform phase inversions, left right swaps and mono sums.

The tools of the trade now allow great flexibility. They are compelling and cost-effective, saving interface and conversion hardware cost, and wiring and system design complexity. Of course, the other edge of the sword is complexity for operators.

Audio isn’t as easy as it sounds. Pictured: A control room designed by Syska Hennessy Group for Mike Post Productions in Burbank, CA.

Often I am asked whether embedded audio makes sense, and the answer is a resounding … it depends. It depends on the type of facility — whether it is a transmission or production facility — and whether the burden of cost and complexity in the embedding and dis-embedding hardware is more than balanced by the lowered cost for single-level routing.

A SMPTE 259M video signal can carry four groups of embedded audio, each with two pairs. This total of 16 mono signals (eight stereo pairs) should suffice for many purposes, if all hardware supported the full capability of the standard. As in all things, it is not that simple. Not all hardware can handle all four groups (of two channels). Some hardware, such as video mixers and compression hardware, may pass nothing through to its output. Thus, it is prudent for the engineer to look at real-world products when considering a design to be sure the full range of needed features are really available.

There are other issues with embedding that are slightly more complex than the mere presence or absence of signal. The latency of an embedder for HDTV audio is generally a bit less than 0.5msec. This is not a particularly difficult problem with a single device. However, when the potential exists for several passes through a similar device and demultiplexers in the same system, it is clear you should carefully consider timing. While the initial short delay certainly will not create a lip sync problem, the total would cause lots of timing issues within a system.


Then there is cost. Embedding is not a particularly expensive function, but if a system will need many devices for utility purposes, or perhaps for the inputs to an audio console, the cost multiplies quickly. VTRs and some other devices allow simultaneous AES and embedded outputs. Clearly, when a VTR includes an internal dis-embedder (as most do these days), it reduces wiring complexity and could eliminate the need for separate layers of audio routing.

From the start, digital audio and video were not designed to work together, with SMPTE standardizing the video interface and AES doing the audio interface. This brings up the unfortunate problem of synchronization, which is exacerbated greatly because most digital video systems are locked to clocks that are related to the NTSC frame rate (60/1.001Hz or 59.94005994Hz). Of course, audio could be locked to the same clock, or 48kHz/1.001, or 47.952kHz.

Then there is the issue of sample alignment between audio and video. There are 8008 audio samples in every five video frames, a cadence that is not particularly friendly. As a result, it is not possible to cut audio and video cleanly on every video frame without significant processing of the audio signal to assure that truncated samples do not produce clicks, pops or worse.

Just one word about audio routing with digital signals. A synchronous AES router is highly desirable because the switches are made without disruptions in the AES sample cadence. Most manufacturers offer both synchronous and asynchronous switchers, but you should strongly consider the design issues an asynchronous router could present. An asynchronous router does not particularly care where the switch happens and will switch without regard to AES frame boundaries. You might think of it as a patch bay — a brute force switch. A synchronous AES router, on the other hand, will always output a complete and uninterrupted digital audio signal because all inputs are reframed internally, and switches happen on AES frame boundaries.

Broadcasters also can route multichannel digital audio using Dolby E or other multichannel compressed audio systems. This allows eight robust audio signals to be carried in one AES stream, or up to 64 audio channels to be embedded in groups in an SDI signal.

Keeping that much audio straight would certainly be a challenge! Broadcasters need flexible audio monitoring tools to handle all this complexity, which must include tools to verify the format integrity and levels in AES, as well as the content and metadata in compressed audio signals. And you thought audio would be simple!

John Luff is senior vice president of business development at AZCAR.

Send questions and comments