Following Audio Through The Delivery Chain

Last time we started our trip following a program from origination all the way to the consumer. We got as far as getting the program on tape with audio metadata. This time we will deliver it to the front door of the network operation center and follow it to the start of distribution to the affiliate.


The final program created in Hollywood, New York or elsewhere can be delivered to the network via a number of different ways. The first, and easiest, is to send the finished tape via a courier service or simply walk it (i.e., sneakernet it) to the network. This approach can take some time; however, it offers the ability to have the least amount of compression.

Another method is to send the program to the network via satellite. Currently, many NTSC programs are delivered in this manner and the quality is excellent. High-definition programs with 5.1 or more channels of audio plus metadata require a substantial amount of bandwidth. It is certainly possible to compress the video and audio and have results that look and sound quite good.

Typical data rates for a high-definition video program with 5.1 channels of audio are in the 45-60 Mbps range. A good example of satellite delivery of programs was the season of high-definition football games provided by one network. Another thought is that once the program has been compressed to 45 Mbps, it can be sent economically via a telephone (i.e., telco) connection such as a T-3.

Except for tape delivery, all the delivery methods discussed so far are real time. Looking not too far into the future, it is very likely that we will see non-realtime delivery of high-definition programming begin. With a T-3 connection, an HD program with up to eight channels of audio and requisite metadata can be compressed to only 270 Mbps and downloaded to a server, perhaps taking six times as long, but providing six times less compression. Someday I am sure I will re-read this article and it will seem as completely archaic as passing HD programs, with multichannel audio, of course, through public networks with little or no compression at all-this will be common practice.

No matter the delivery method, once the program has made it past the front door of the network it can and should be sent through a quality-control operation (QC) to ensure that the audio, video and metadata are all correct and appropriate for the program. If necessary, corrections can be made prior to or during storage for eventual play-out to air. If the program is live, metadata (just like audio and video) can still be checked and adjusted if necessary.


Once the program is ready for play-out, it will most likely pass through a master control switcher of some sort for final commercial insertion and the addition of any desired effects. Simple operations, such as switching in a commercial or a station ID, pose little problem for audio and audio metadata. Fig. 1 shows a conceptual master control switcher and some of the sources it must deal with.

(click thumbnail)Fig. 1 Conceptual drawing depicting what a master control switcher must be able to deal with on both audio and audio metadata levels. Note the transition types, and the crossfade, which is marked with a "?" as it is the most challenging transition of all. Incidentally, TAMO stands for "Then A Miracle Occurs."
Sophisticated operations such as a "squeeze and tease" pose very large problems. After hearing this term for the first time, I was really curious to find out what technical meaning this provocative name could possibly have. It is simply horizontally squishing a program down to two-thirds of the screen and filling the now empty two-thirds with a teaser for the late-night shows or the upcoming news program. This usually happens just as the credits are rolling. It is a cool effect, but has the downside of having made it virtually impossible for me to see my name in the very few movies I actually got a credit in.

Another (and far more important) downside is the fact that this operation requires a voice-over. Once a relatively simple operation, a 5.1 channel voice-over is at least three times as complex. One problem is that if the voice-over is mono or stereo and the main program is 5.1 channels, into which channels do you mix the voiceover?

A further problem is if the main program has dialnorm values that vary during the voiceover, which causes the voiceover to change in level once dialnorm is applied by the consumer decoder. This dialnorm value change may be appropriate for a program shift, but it has nothing to do with the voiceover, which will sound like it changed in level for no apparent reason.

What about other operations, such as simply switching between two programs with radically different dialnorm values? Although it is unlikely that it would happen, it is certainly possible. Also, what happens when transitioning between two channel and 5.1 channel programs? All the transitions shown in Fig. 1 are relatively straightforward except for the crossfade. When should metadata switch between the two programs? Remember that this will cause the consumer decoders to change modes and it could sound objectionable.

Are these problems surmountable? I think so, but it requires a radically different approach to the problem and we will save the details for next time. For now, let's imagine that everything is working smoothly so that we can get the signals to the next stage.


Somehow, the video, audio, and metadata signals have made it through the NOC and, after passing through the final DP-570 and monitors in master control, are assumed to be correct. At this point, the trip to the affiliate has one insurmountable demand: it must be done in real time. Primetime is primetime, and live national reports must be just that: live.

This means that the high-definition video, audio, and related metadata must be as compressed as necessary to fit through the satellite and/or telco pipes that are currently available. This may change in the future, but for now it means 45-60 Mbps in most cases.

There are several distribution encoders and decoders available from manufacturers such as Harris, Harmonic (formerly Divicom), Tiernan and Tandberg.The decoders are usually packaged as an Integrated Receiver/Decoder, or IRD allowing for direct connection to a satellite dish. Most of these manufacturers have recognized the need for networks, both terrestrial and satellite, to deliver the multichannel audio that consumers have grown to expect, thanks in large part to the explosive growth of the DVD format.

To this end, several of the distribution systems are capable of supporting compressed audio distribution formats such as Dolby E. Beware, though-not all systems are created equally as far as audio is concerned.

Simply passing Dolby E or other compressed audio formats is not enough. It must be done in a standard manner and provide proper A/V synchronization. SMPTE standards 337M through 340M provide accurate technical details showing exactly how to accomplish this task, and if interoperability is ever desired, these specifications should be carefully followed.

At least one manufacturer provides "multichannel audio capability" by supplying three two-channel MPEG audio streams. There are two major problems with this. The first is that there is no provision for audio metadata. Can anyone guess what the second problem might be? I'm not going to tell; you will have to wait until next time, but I will take guesses via e-mail.

Next time we focus on master control operation in fine detail and try to solve some of the problems we have raised this time. As master control switchers are present at nearly every stage, it is crucial to get them right.

Add to this a few other roadblocks we will discuss, and you should have some great ammunition to take with you to the manufacturers at the NAB show.

I have seen a growing trend in the e-mails I receive-the questions are getting more and more sophisticated, which is a rewarding feeling. We are definitely getting somewhere, so keep them coming, and as always, thanks for your time!