NextGen Audio a Work in Progress

(Image credit: Getty)

I began experimenting, testing and writing about immersive audio in early 2012 and remember hearing the term “Next Generation Audio” (NGA) sometime thereafter. Since then, a variety of groups have advocated for a variety of features and formats for the next-generation content producers and equipment manufacturers to build on.

Currently NGA describes such features as immersion, interaction and intelligent interface between playback devices and reproduction devices, but in the United States there was no government mandate to keep forward momentum with the implementation of those features. To add to the confusion, not only was there no mandate, but there were also competing and incompatible formats with Dolby Atmos and MPEG-H.

Neither Dolby nor Fraunhofer invented immersive sound or NGA, which is actually a direct result of technology advances and clever compression schemes that achieved greater capacity and quality for producing advanced audio. Immersive sound is possible because broadcasters can get the minimum of 10 discrete channels of sound—5.1.4 to the consumer within a narrowly defined data bandwidth. The audio bandwidth can be allocation for audio beds, channels or objects to enhance the immersive experience or for interactive features.

Is it Worth It?
Immersive sound may not be the ultimate entertainment experience for every consumer—research shows a steady rise in stereo and 5.1 surround soundbars but with only a very modest increase in 7.1 and the “other category,” which would include some variation of immersive sound. This leads me to believe that perhaps immersive sound production, particularly for sports, is just not compelling enough to spend extra money on higher-quality speakers.

Immersive sound has been the most fostered focus of NGA, perhaps because it was the most developed feature with the ATSC 3.0 rollout. But the illusion of cinematic sound seems delusional with the proliferation of “faux immersive” DSP processing and soundbars that make all kinds of claims of immersive sound reproduction. I ask again: How can you get an immersive experience from an up and side firing soundbar? Maybe the consumer is savvier than first thought.

The broadcast world has understood the craving of user-controlled sound since consumers have been able to shape their home sound with the inclusion of the bass and treble controls. Dolby was quick to introduce dialog controls, but nothing as interactive as alternative dialog channels that have been suggested with NGA.

Dolby ushered in personalization with the introduction of Dialog Normalization, which was obviously a good idea for different listening environments and content, but Fraunhofer’s MPEG-H offers the possibilities of true user interactivity. For example, with over-the-air broadcasting the coaches and player language can be problematic, but an isolated and controllable coaches’ channel eliminates the prescreening and sanitizing that take place in a live broadcast. Significantly for the CFO’s, a coaches’ channel could be monetized. Note, Dolby claims the same level of interactivity through using object channels.

All this interactivity can be ultimately limited by the producers and rights holders with options for improved dialog, alternate narration and even select objects such as radios and wireless microphones. The ability to control any one of multiple players’ or coaches’ microphones and listen in mono or stereo—that seems like a pretty immersive experience to me.

Rendering the Content
The final and probably most under-valued benefit of NGA is the ability to render the audio content to virtually any consumer device or format. For example, consider that the audio elements are embedded in the digital stream and can be combined—rendered to a proper recipe for a mono, stereo, surround or immersive sound mix.

The early shortcomings of surround sound were the downmix and metadata. Also remember that using the set-top box for combining surround channels to derive a stereo mix has always been problematic for accurate and equivalent sound productions because many of the sound elements are baked into the mix. Rendering takes all the ingredients and makes a whole new cake.

Rendering is the final process before reaching the consumer and in theory, rendering can take different transducer characteristics, configurations and containers and optimize a soundscape for any listening device. However, I am still mystified as to how you can get an accurate representation of an immersive landscape from side and up-firing soundbars—although it really does sound better.

A lot has been said about the potential of Next Generation Audio. While the technology is evolving, content with compelling immersive sound and interactivity is lagging, except in gaming. Just as surround sound started conservatively, immersive sound production has started conservatively as well.

One significant difference is that early soundbars were a big improvement for surround sound, but I am not convinced that the immersive sound experience has benefitted as much from soundbars as surround sound did.

The immersive experience is different for us all, and Next Generation Audio provides a framework, but no roadmap. I have recently published a book with Focal Press titled “Immersive Sound Production—A Practical Guide” that presents an advanced approach to live immersive sound design with a concentration on live sports and includes more than 60 different sports case studies.