Using Metadata to Mix Multichannel Audio

Last time I left you with a question: What is the problem with sending six channels of audio via three two-channel MPEG streams?

Last time I left you with a question: What is the problem with sending six channels of audio via three two-channel MPEG streams? Other than having no inherent provision for audio metadata, the problem is that the three audio pairs are asynchronous. This is due to a combination of how the streams are multiplexed and demultiplexed, as well as each pair being data rate-reduced separately. This is no problem for six mono channels or three stereo pairs, but when you attempt to use this type of system to carry a 5.1 channel program, a downmixing issue arises. When the program is downmixed by a two-channel decoder (such as one of the millions of set-top boxes), the results can be unpleasant sounding. The bottom line is that while it might suffice for the short term, it is not a great idea to rely on such a product for the future. This time we will propose a solution for the nagging master control issues we uncovered as well as give a "heads up" for things to look for at NAB.


So, let's get directly to the heart of the matter for this month: How do you perform a standard crossfade between a two-channel and a multichannel program? Good question. Unfortunately, the simple answer is that there is no simple answer.

The wrench in the gears is none other than audio metadata. To operate as designed, the metadata must change at the boundary of the two programs. When the video and audio switches, so must the metadata. Unfortunately, if the video and audio dissolve from one source to the next, the metadata would have to do the same and it cannot. Remember that while the 5.1 channel program will be reproduced as such by a 5.1 channel decoder, a two-channel program will first be Pro Logic-decoded, then output as a 4.0 or 4.1 (via bass management) channel program. This switch occurs when the audio coding mode (acmod) element of metadata changes from 3/2L to 2/0. Furthermore, what happens if the two programs have radically different dialnorm values?

Although it is simple enough to crossfade between 5.1 channel and two-channel sources, when do you switch the metadata? Let's suppose we are crossfading from a 5.1 channel program to a two-channel program. If we switch metadata at the beginning, the acmod will change from 3/2L to 2/0, while there is still 5.1 channel audio present. This means that the Center, Left Surround, and Right surround channels disappear for the duration of the crossfade-not good as the dialogue will be missing.

If we instead switch metadata at the end of the crossfade, everything will be fine until the acmod again switches from 3/2L to 2/0. The multichannel consumer decoder will suddenly invoke Pro Logic decoding at the end of the transition and dialogue will snap from the Left and Right speakers to the Center speaker. Switching at the center of the transition is the worst of both worlds and makes a real mess. If this weren't difficult enough, we cannot forget the dialnorm issue. If the values of the two programs are different, there will be a level shift if the value changes before the audio does. This means that in the above scenarios the dialnorm value of one of the programs will be applied to both for the duration of the transition. If they are very different values, there can be disturbing consequences.

So what are the options? The easiest is to not allow crossfades. If this is not an option, it is possible to make the system work by tricking metadata. It is rather complex, but simply put, you work the system backwards: Fix the metadata values and make the audio fit. As audio metadata is data that describes the audio, it can work in either direction. You can describe the audio that exists by changing metadata values, or you can change the audio to match preset metadata values. By fixing dialnorm at -31 and making sure that all sources match this level, you can now crossfade between sources with no level shift. By fixing acmod at 3/2L and making sure all sources fit this channel configuration, you can now crossfade between sources with no loss of channels. Sounds simple, but it requires some tools and products that do not yet exist commercially. As master control is one of the final remaining hurdles for passing multichannel audio, I will be approaching the likely suspects at NAB and asking them how they plan to solve the problems outlined above. Who knows, maybe my solution will fit the bill. I'll give an update after the show.


As you are walking around NAB this year, I have some suggested things to look for and some specific questions that you should ask potential vendors while searching for new equipment.

First, remember how many channels of audio you may be dealing with soon, if not already: six for 5.1 programs plus an additional two for SAP and/or Descriptive Visual programs bringing the total to eight channels. Make sure that all the equipment you are considering for your DTV operation has provisions for eight channels of audio, where appropriate, of course. I doubt that eight channels will be required for general field work any time soon, but the gear back at the station or network should be able to deal with it. Keep an eye on audio mixers with only four busses, even in edit bays. Consoles with six or eight busses are common and inexpensive and I recommend investing now.

Metadata is another area to be concerned about. Audio metadata is just one type; remember that there are many others, including closed captioning and video metadata. How will you route it, store it, display it and test it?

I will also be looking to see how far certain products have advanced. I am particularly interested in master control for two reasons. The first is how they deal with A/V sync. I have seen and heard of too many cases where the sync shifts during effects insertion to believe that it is a non-issue. The second is how master control switchers deal with multichannel audio and metadata as I described above. If the manufacturers have not caught on yet, I will suggest that they investigate the stunning amount of multichannel material currently being broadcast to consumers via cable and satellite. Now is the time to either jump on the train or get prepared for a long stay at the station.

I will be visiting the manufacturers of distribution equipment such as Tandberg, Harris/Lucent, Tiernan and others. I will be digging to find out how well their equipment supports multichannel audio and if the equipment obeys SMPTE standards such as 337M, 338M and 339M to allow for interoperability.

I am also interested to see new machines for the now-standard Sony HDCAM and Panasonic HD-D5 formats, both of which can handle multichannel audio and metadata. I will, of course, keep an eye out for any new formats and make sure they have the same audio and metadata capabilities. The days of four-channel VTRs with no capacity for expansion (via a system like Dolby-E) have come to an end.

Of course I will also be stopping by the Dolby Laboratories booth to see old friends and new product features. I am especially interested in the LM100 Broadcast Loudness Meter, as I understand it now has some advanced detection and logging capabilities.

With audio on the brain, I will also see the Wohler people. Will Wohler, Carl Dempsey, and the rest of their crew are consistently developing innovative and useful monitoring products that have become the mainstay of multichannel broadcasters.

Of course I will also stop by Videotek, Leader, and DK Audio as they always have interesting products to see.

I will also be on the lookout for new technologies. To that end, I will be stopping by the Microsoft booth. I have recently had the pleasure of working with its Windows Media 9 Series codec and I was impressed with the HD video and multichannel audio capabilities and performance. Most interesting is the fact that the HD video and multichannel audio decoder and encoder can be downloaded for free from the Microsoft Web site (yes, free).

Have a productive, fun and safe NAB!