Audio: Making 2.0 from 5.1
Metadata and quality control have become more important than ever with the switch-over to the ATSC standard. With many viewers still watching in SD on legacy analog equipment with stereo—even mono—sound, producers and post producers must closely monitor, and control, by applying appropriate metadata, the audio downmix destined to be generated in the set-top box, television set or home receiver.
As Jim Starzynski, principal engineer and audio architect, NBC Universal (NBCU) Advanced Engineering, noted, "Any time a television station transmits in 5.1, the two-channel version that is heard by the consumer is done within the set-top box, the integrated television/receiver, or in the home theater receiver. That's automatically the way that it works. That downmix that creates the 2.0 version from the 5.1 is under control of the metadata that is transmitted by the television station. This process is an inherent part of the ATSC system."
This process applies specifically to HD transmission services, of course. NBCU's cable channels, which include MSNBC, CNBC, USA, SyFy, Bravo and others, still maintain stereo standard definition channels according to Starzynski.
"What is built into the ATSC spec is the idea that the stereo will be derived as a Pro Logic downmix of the 5.1. Knowing this means that you need to be producing in discrete 5.1," said Roger Charlesworth, executive director of the DTV Audio Group and president of Charlesworth Media.
Charlesworth advocates for a "single-threaded delivery," the idea that the broadcast plant should generate and transmit only a high-definition picture and 5.1 audio. Through the correct application of metadata both picture and sound will respectively automatically reformat and downmix in the home according to the metadata parameters set by the content producer.
"Everyone understands that we're sending an HD picture and we're making the SD picture where needed," he observed. "The same thing is true of the audio."
Adopting streamlined production and post production processes that create a single audio delivery format offers significant advantages. "It's the only way you can really QC what the stereo is going to be," stressed Charlesworth, who added, "it has implications for people who are mixing TV shows; they really need to be listening to that downmix."
But it does require a change of mindset and a break with traditional workflows and the separate creation of 5.1 and stereo audio streams. "If we can get out of this schizophrenic thing of having two standards that are side by side—but they really aren't—it will make life easier for people," said Charlesworth.
Choosing a single-threaded workflow assures predictability and also allows accurate confidence monitoring, he continued. "If we're going to start with the stereo mix and then we're going to upmix and then we're downmixing it downstream we really don't know what we're getting. That's why, in terms of stability, people who are thinking strategically are moving toward discrete 5.1. It makes a lot of sense in terms of having a predictable product. Then, anywhere in the production process, you can make a downmix of it and say, 'That sounds right.'"
ONE SOUNDTRACK AT A TIME
In the decade between the introduction of the ATSC standard and the mandated transition to exclusively digital operation, content creators typically generated both a 5.1-channel mix and a stereo mix. These would usually be delivered on the eight tracks of a Dolby-E stream or on discrete videotape channels in order to service both HD and SD transmissions.
But, with the exclusive adoption of the DTV standard and 5.1 delivery, content creators must get out of the habit of thinking that the stereo deliverable will go to air, said Starzynski. NBCU's deliverables spec calls for a Sony HDCAM SR tape with both a 5.1 and a stereo mix, but the 2.0 mix is strictly for internal monitoring purposes.
"If they supply us with a 5.1 channel soundtrack, the two-channel version that they've mixed is only there for archive and protection purposes, just in case we need to screen that tape in an environment that doesn't have 5.1. The two-channel version that the audience hears is the on-the-fly downmix that occurs in the set top box," Starzynski explained. "This is the way the ATSC system works—we supply one soundtrack at a time."
Post production houses should therefore get out of the practice of needlessly generating two mixes, agrees Charlesworth, recalling a recent conversation with the owner of a New York facility. "They're doing commercials and currently do a 5.1 and a stereo. Because they're used to doing it they slave over the stereo and the client approves it and they send it to the network. There the stereo gets thrown away. They've maybe dashed off the 5.1 or made it a little bit wilder than they would. They need to transition their workflow and mix through the 5.1 to the stereo."
In a discrete 5.1 workflow, Starzynski continued, "What becomes extremely important is that the content provider has to audition both the 5.1 that they're doing as the primary deliverable, and a metadata-controlled 2.0 downmix of the 5.1, so that they understand what both are going to sound like.
"We post those metadata figures in our program spec. We make this process very clear in the main audio section as well—that content suppliers really need to audition the 5.1 and the 2.0 version that is created from the 5.1 that's under metadata control to make certain both meet their expectations."
Charlesworth noted that Pro Logic chips are so ubiquitous, even in low-cost equipment, that there's really no reason not to let home systems handle the metadata-controlled downmix. "There are millions of AC3 decoders that are doing this in set-top boxes or DVD players," he said. "This is part of the plumbing of the whole planet; you can buy a DVD player at the grocery store and it'll have an AC3 decoder in it capable of making a Pro Logic downmix. And the Pro Logic downmix happens in a predictable way."