Keeping MultiChannel Audio Under Control

If you were able to attend, I hope your NAB experience was a good one and that you didn't get too many blank stares when you asked manufacturers about multichannel audio and/or metadata. Last time when I answered the question "What is the problem with sending six channels of audio via three two-channel MPEG streams," I said that the pairs are asynchronous, but upon reviewing what I had written, I realized that I should clarify this.

What I meant is that although the pairs themselves are synchronized (i.e., their sample rates are locked), the audio signals themselves vary slightly and randomly due to the bit-rate reduction being performed separately on each pair. Now we will look at master control and then move on to the first part of getting the signal out the door for distribution.


Last time we discussed a possible solution for the master control dilemmas we uncovered. Fig. 1 graphically illustrates the approach.

(click thumbnail)Fig. 1
In addition to downmix-compatible matrix decoders, this drawing assumes a few other pieces are in place. First, metadata must be re-authored after it has passed through the switcher. As all audio sources can fill 5.1 channels, the audio coding mode parameter (acmod) must be overwritten to be a fixed value of 3/2L (three front/two back, plus an LFE). This is true even if the nightly news broadcast is mono voice with a stereo intro, because after matrix decoding, the voice will come out of the center channel and the music will be spread throughout all the channels. The drawing also assumes that dialnorm will be applied to the audio sources to allow crossfades, voice-overs, and other mix-type operations to take place. Therefore, the dialnorm parameter can be selectively overwritten as it only needs to be set to a fixed value while a transition is taking place.

One last point is that for perfect downmix compatibility, the 90-degree surround channel phase shift should always be enabled in the Dolby Digital (AC-3) encoder. Leif Claesson, a friend from Octiv Inc., read my July 2002 article in which I showed the downmix equations and pointed out that I need to remind Dolby Digital (AC-3) encoder users to turn on this phase shift or the formulas will not work as well as they could. The filters only process the surround channels and are inaudible under most all listening conditions; however without them, consumers with two-channel receivers and a Pro Logic decoder will not get the best audio possible.

So, how do you do all this metadata work? Luckily, the Dolby DP570 Multichannel Audio Tool can handle it all. In fact, with the scenario outlined in the drawing, it can even act as the metadata switch when given GPI commands from the master control switcher. After overwriting the necessary metadata parameters, it can also provide the advanced monitoring functionality required for the master control environment.

Now that we truly have audio signals coming from master control that have been switched, crossfaded, voiced over, and have correct metadata to describe the channels, we must figure out how to get it to the affiliates.


With multiple two-channel MPEG streams out of the picture, currently we are left with only a couple ways to distribute multichannel audio from the network to the affiliate. The first is to use Dolby Digital (AC-3). Yes, I know, many articles back I said that concatenation of Dolby Digital is not a good thing. This may be true at lower data rates, but if the distribution is set to a higher rate, such as 640 kbps, the results are quite acceptable. The upside of using Dolby Digital (AC-3) is that it can carry 5.1 channels of audio in a small space. The downsides are that it takes about 180 milliseconds (about six NTSC video frames) to encode, and the system must be set up very carefully so that metadata is essentially ignored.

At least one terrestrial network has been successfully using this system since 1998. They feed the three AES pairs of DP569 Dolby Digital (AC-3) encoder at the network and each affiliate has a DP562 or DP564 Dolby Digital (AC-3) decoder that gives them the three AES pairs back. The system is set up so that none of the metadata is used, at least in the distribution path, but rather is manually set at each station's final emission Dolby Digital (AC-3) encoder. Because of this, the station can pass the decoded audio through a normal master control switcher and perform all the normal operations they might have performed in the two-channel world. The downside of this approach is that they must make manual metadata changes with their Dolby Digital (AC-3) encoder. This can be accomplished if they have GPI/O available from the master control switcher and use either a DP570 or wire directly to the GPI/O inputs of the DP569. They could even use a simple toggle switch, like at least one affiliate I know of. It's a cheap solution, and it works as long as the master control operator remembers or has time to throw the switch. I recall watching a few programs that were supposed to be in 5.1, but the switch had not been thrown in time. All I was hearing was the left and right channels with dialog missing and I, like the program, was speechless when I tried to explain to my family what was wrong with the sound.

Unlike the "high-rate" case described above, yet another network has chosen to distribute Dolby Digital (AC-3) at its normal emission rate of 384 or 448 kbps. It is also sending the video compressed to the network's emission rate, and the two signals are multiplexed into a standard ATSC transport stream ready for transmission. This allows affiliates that may not have the budget for an ATSC video and audio encoder to simply transmit the transport stream. The upside of this system is that they don't have to decode/re-encode the Dolby Digital signal as it is already at the emission rate-that is, unless they would like to add any local audio or voice-overs. As the stations that can afford only to pass the transport stream would not have the ability to insert local programming anyway, it works well for them. The affiliates that can process this transport stream and do local encoding must decode/re-encode the video and audio, at least for now. I think it is a fair trade to decode/re-encode the signal during these local operations, then switch back to the original Dolby Digital (AC-3) encoded signal for the rest of the program. The issue then becomes switching the compressed Dolby Digital (AC-3) stream being delivered from the network with the one being created locally. After doing some experiments back when I was at Dolby, I know that this can be made to work. The benefit of getting more stations on air far outweighs the added complexity of this system.

Dolby E become very popular since its launch in 1999. Many programmers use it to deliver multichannel audio to the final Dolby Digital (AC-3) encoder used in satellite and cable, and it is spreading to broadcasters as well. Dolby E was designed to carry up to eight channels of audio plus audio metadata, to handle multiple concatenations, have low latency, be video frame-bound for editability, and fit into a 20-bit 48kHz AES pair. That AES pair has some restrictions as it is carrying data, not audio data. Operations such as sample rate conversion, level-shifting and channel-swapping may be fine for audio data but are disastrous for pure data. The path must be free of all processing that affects the audio payload section of the AES signal. This is becoming easier to do as an more manufacturers are becoming aware of the additional demands being placed on AES channels. Moreover, a new breed of test tools is becoming available to help verify that these channels will pass compressed audio data.

At least two major terrestrial networks are or will be using Dolby E, but both are doing it in very different ways. One is using the system to carry audio and audio metadata, and the other is using it just to carry audio as they have a separate path for many different types of metadata, including audio metadata.

Next time we will get deeper into Dolby E and its practical application by networks, as well as mapping audio and data into MPEG transport streams and why it is so important to do it in a standard manner (i.e., SMPTE 337M-339M).