ATSC Sounds Out 3.0 Audio Proposals

LOS ANGELES—The technology of broadcasting is due for a major upgrade once the Advanced Television Systems Committee finalizes the specifications for the proposed ATSC 3.0 standard, designed to replace the 20 year old ATSC 1.0 specification (augmented in 2013 with ATSC 2.0). At the association’s meeting last month, the record number of attendees indicated how crucial this new standard is going to be for the future of TV both over-the-air, on cable, and potentially influencing IT delivery.

MPEG-H Audio offers viewers the ability to turn up or down particular audio elements in a program--such as dialogue or sound effects--as they prefer.

“Our schedule is to get to a ‘candidate standard’ level for all the components of ATSC 3.0 by the end of the year,” said ATSC President Mark Richer. “This will then be voted on before the end of next year by our technology group and then by the whole ATSC membership for the final decision.”


The industry is abuzz with speculation about the upcoming video standard in ATSC 3.0, possibly including High Dynamic Range, multiple frame rates, and wider color gamuts. But the audio component of this new standard has received far less publicity.

Yet the field of competitors for the next broadcast—and therefore production—audio standard which once included Dolby Labs, DTS, Fraunhofer IIS, NBCUniversal, Samsung, Technicolor and Qualcomm, has been whittled down to two contenders: Dolby, with its AC-4 technology, and the MPEG-H Alliance represented by Fraunhofer on behalf of its partners, Qualcomm and Technicolor.

MPEG-H is an existing international standard that Fraunhofer is proposing as the audio part of ATSC 3.0, according to Robert Bleidt, general manager of Fraunhofer USA Digital Media Technologies in San Jose, Calif., a subsidiary of Fraunhofer-Gesellschaft, the renowned German R&D organization. “We are trying to use the MPEG-H approach to bring three primary features to TV viewers: interactivity, greater immersion, and universal delivery.”

As Bleidt explains it, the “interactivity” capability will let viewers adjust the presence of various audio objects in the broadcast signal. This could include bringing an announcer’s voice out of the background for greater clarity, or the option of listening to either the home team or the visitor’s native broadcast mix depending on fan preference, and even the potential of hearing programs in different languages.

Robert Bleidt

The “immersive” factor is a quantum leap over Surround Sound because it adds 3D audio components such as many more channels including ceiling or floor speakers. Fraunhofer’s psycho-acoustic tests have shown this is almost as significant as moving from stereo to surround sound.

To get these sounds to the panoply of big and small screen devices that want to receive them, MPEG-H can optimize its audio reproduction to best match the capabilities of the system to which the viewer is listening.

“One of the primary advantages of the MPEG-H system is that it is designed to work with the existing equipment of today’s broadcasters,” Bleidt said. “We’ve established a four-stage process to gradually put MPEG-H into today’s broadcast workflow. In fact, we showed all four stages working live on-the-air at the NAB Show.”

Fraunhofer accomplished this with the help of some MPEG-H audio monitoring units from Jünger Audio after installing new encoding/decoding equipment to handle the MPEG-H codecs. They demonstrated that with MPEG-H there is no need to overhaul either existing control rooms or outside broadcast trucks, and it can be easily edited on the NLE timelines such as Adobe’s Premiere Pro CC.

The four stages to attain full MPEG-H capabilities start with transmitting Surround Sound using current metadata, but with a 50 percent reduction in bit rate over today’s AC-3 model. Then the broadcaster could add interactive objects to let viewers adjust the dialog or sound effects levels in the audio mix. Step three incorporates the inclusion of 3D sound with ceiling channels and the final step includes the option to introduce “higher order ambisonics” (HOA) to represent the sound environment by determining the accuracy of representation of the pressure field. A spatial encoding is performed on the HOA coefficient signals to fit HOA signals into a studio’s existing SDI infrastructure.

To enable flexible implementation, MPEG-H can use any combination of three techniques to get the audio through a broadcast system: traditional channels such as 5.1, 7.1 or more, audio objects, or HOA. 

“Even though this will require viewers to purchase all new equipment to get these advantages,” Bleidt said, “the MPEG-H Alliance feels if MPEG-H is adopted for the audio component of the ATSC 3.0 standard, it will be a great opportunity to remove the shortcomings of the prior system and give viewers the advantages of these new audio technology features.”


Dolby Labs’ candidate for ATSC 3.0 audio revolves around the company’s AC4 technology.

“For us it is all about improving today while enabling tomorrow,” said Jeff Riedmiller, senior director of the sound group, office of the CTO, at Dolby Laboratories in San Francisco. “Three years ago when we started developing AC4, we really went back to the drawing board.”

Jeff Riedmiller

As Riedmiller describes it, the vision of AC4 is based on four pillars, the first of which is to improve reach to enable efficient delivery to all types of devices.

While the Surround Sound technique found in its predecessor, AC3, provided a multichannel living room experience, AC4 offers 50 percent more efficiency, enabling delivery of 2D and 3D sound environments to a wider number of homes and mobile receivers. AC3 delivered stereo sound at 192 Kbps and 5.1 surround at 384 Kbps, but for today’s IT delivery, the market is demanding stereo bitrates at 48-64 Kbps, 5.1 at 100-200 Kbps and immersive multichannel audio like Dolby Atmos at 200-300 Kbps, which is what AC4 enables.

The second pillar is simplified daily operational aspects, such as automatic volume leveling and seamless audio for ad insertions. To comply with the CALM (Commercial Advertisement Loudness Mitigation) Act, AC4 has built-in intelligent loudness and multiband dynamic range control without manual intervention.

“These compression curves in combination with the loudness information generated by the encoder produce in real time the dynamic range coefficients desired by the original artist,” Riedmiller said. “The decoder then has the ability to scale across the needs of various playback options ranging from mobile phones to home theater systems.” 

Pillar number three of AC4 allows the end user to choose which components of the audio tracks they want to select to enhance their listening experience, including descriptive video, multiple languages or home field broadcasts, with the ability to enhance and control the center announcer or dialog track’s presence.

Then the fourth pillar delivers immersive 3D object-oriented sound, including height dimensions and potentially speakers providing sound at floor level. 

“Just adding the height element brings us a huge leap toward engaging the senses in a more natural way,” Riedmiller said. “The next system for ATSC sound needs to shoot for ‘audio nirvana,’ sort of a ‘Stairway to Heaven’ beginning with better codec efficiency, improved loudness management, enhanced audio perspective and language substitution, and then the immersive benefits of audio objects throughout the sound field.”

Riedmiller acknowledges that any system based on ATSC 3.0 will not be backward compatible with current ATSC equipment, but added that “we needed to start with a clean slate when designing our audio proposal. 

“TV and set-top box manufacturers should be able to develop additional support capabilities to provide the audience with a satisfactory viewing experience on legacy equipment,” he said.

The proposed technologies from both contenders—Fraunhofer with the MPEG-H Alliance and Dolby Labs’ AC4—are based on years of audio technology experience. Whichever wins the competition for the audio component of the ATSC 3.0 standard will provide a quantum leap in the audience’s enjoyment of future digital audio/video communications.