AES considers immersive audio to complement 4KTV

While video engineers and videophiles everywhere are debating the virtues of 4KTV, with its four times greater image resolution than that of current high-definition television, audio engineers at the recent Audio Engineering Society (AES)’s annual gathering in New York were also espousing the notion of as many as four times the number of audio channels of state-of-the-art 5.1 surround sound. At that point, scientists, sound designers and broadcast mixers like to refer to the various audio elements of a scene as “objects” — not channels — that can be individually manipulated in new ways for the benefit of the entertainment experience — from the cinema to television and even to individual media players and multi-dimensional headphones.

At a panel discussion entitled “Audio for 4KTV,” a group of experts representing both audio equipment manufacturers and one broadcast network (NBC) detailed how the advent of more audio elements than current 5.1 and 7.1 surround sound systems provide was good for theater goers as well as home viewers, who probably won't have 22.2 speakers installed in their living rooms.

Equipment suppliers like Robert Bleidt, Fraunhofer; Tim Carroll, Linear Acoustic; Robert Orban, Orban; and Jim Starzynski, director and principal audio engineer, NBCUniversal Advanced Engineering, echoed the sentiments of audio scientists Robert Reams (Psyx Research) and Poppy Crum (Dolby Labs) when he said the need for sophisticated codecs and “smart TV decoders” that are able to decode the coming ATSC 3.0 broadcast standard signals based on the equipment on hand was just in the drawing stages but would make the listening experience, even for those with “immersive headphones” much more enjoyable. In fact, such audio techniques will be part of the ATSC 3.0 standard that could be completed by 2016.

“At this point, the notion of more than 7.1 speakers in someone’s home theater is a bit optimistic, but the goal is to bring this immersive experience that 4KTV provides to the listening environment in a way that is attainable by lots of people,” Reams said. “We know the high-end audiophile will embrace this quality sound, but in order for it to be successful commercially, it has to trickle down and be affordable and practical to the average consumer. It will certainly take time, but this industry is capable of developing technology to make immersive sound attainable for all.”

Three systems

Three companies are vying for attention for the establishment of a standard to complement the improved pictures of 4KTV sets. There’s Dolby, with its Atmos theater system (up to 30 channels and part of the new AC-4 standard); DTS, with its Ultra High-Definition (UHD) audio system (aka “Neo:X” consisting of 11.1 channels); and Fraunhofer, with its MPEG-H (up to 22.2 channels), which all offer “multi-dimensional audio” codecs with speakers enveloping the viewer (placed on the sides, in front, back and on top). At the AES convention, DTS also showed its DTS Headphone:X, designed to bring multi-dimensional audio to the individual consumer.

Dolby’s Atmos allows a film's sound designers to map up to 128 independent elements to up to 64 locations in a theater — including above the audience. The goal is to someday bring these technologies, initially designed for the cinema, to the home for TV and listening via headphones.

Fraunhofer’s system consists of an Extended HE-AAC based audio codec and a 3-D rendering engine, which supports the efficient transmission of “3-D audio signals” and flexible rendering for the playback of 3D audio in a wide variety of listening scenarios. These include 3D home theater setups, 22.2 loudspeaker systems, automotive entertainment systems and playback over headphones connected to a tablet or smartphone.

Interestingly, Fraunhofer’s Bleidt said his company has proposed an “intermediate step”, whereby a consumer could add just four additional speakers overhead to an existing 5.1 surround sound system and, using its new MPEG-H codec, get a vastly improved audio listening experience.

“We’re talking about allowing the viewer not just to watch a program, but to actually take part in it and feel like they are in that environment,” he said. “This is exciting stuff.”

Said Dolby's Crum, “We’ve reached a point where designers and mixers are looking at content in terms of objects, not channels. It makes the perception of the screen wider and brings the viewer into a scene like never before. Of course, different people will have different perspectives on what systems of the future should be, but we all agree it will include more tools for the sound designer and a better experience for the viewer and listener.”

Added Dave Case, of DTS, another panel participant, “I think it’s early to start designing rooms until there is more content available, but the evolutionary path has been laid out, and we’re all pursing a future that includes an audio system like we’ve never experienced, outside of the lab, before.”

Although the DTS NEO:X system is installed in a few dozen venues around the world, and Dolby’s Atmos technology is installed in 200 theaters (35 facilities use it, and 25 premiere movies — including “The Hobbit” and the more recent “Gravity” — have already been mixed with it), the very idea of getting this immersive with sound is still in its infancy. And it’s really pie-in-the-sky in terms of home theater use and whether broadcasters will ever embrace such a bandwidth-hungry and complex system. Consider that many local TV stations in the U.S. still distribute their content in analog stereo.

Yet there is significant interest, both within the ATSC and the AES, as well as among equipment manufactures to work towards an expanded aural encounter to match UltraHD images.

DTV Audio Group

At a separate gathering of the DTV Audio Group, an organization working on next-generation audio and the ATSC 3.0 spec for broadcasting 4KTV, the three main audio proposals were described and discussed in detail. NBC’s Starzynski, who was recently named its new chairman, moderated an afternoon of panel discussions. (Roger Charlesworth, a veteran audio engineer, is Executive Director.)

Several DTV Audio Group members, including Linear Acoustic’s Carroll (whose company provides broadcast audio signal measurement and processing equipment to the industry), noted that new uses for metadata would have to be worked out and managed in such a way that the viewers could have unprecedented control over parts of an audio stream (such as individual announcers, foreign language broadcasts, and even which part of a live telecast a viewer might want to downplay), through special UI software embedded into the program.

The practicality of distributing such massive amounts of audio metadata and program elements for a single broadcast (whether live or pre-produced) is still being debated.

“Getting 22.2 or likely more channels, plus tightly synchronized metadata from production to transmission, is going to be a formidable challenge,” Carroll said, joking that mismanaging such a large amount of metadata would make problematic audio broadcasts, “at least 22.2 times worse.”

Indeed, it might take a while for “immersive sound” to become reality in consumers’ homes, but apparently, for an audio engineer and sound designer, it’s doesn’t hurt to dream.

“We have an opportunity to extend an advanced audio experience to the palm of your hand,” said Tom Sahara, vice president of Operations and Technology at Turner Sports. “Why not do it?”

[Of note: Japanese broadcaster NHK is considering using 22.2 channels of audio for parts of its live broadcasts during the 2020 Summer Olympic Games from Tokyo, which could also include 8K resolution video images.]