Skip to main content

Audio Objects for 4KTV

The current buzz in the television industry is all about the promise of 4K, yet most of the talk is centered on video quality, with very little being said about the audio that supports the image.

The reason so little is being said about audio for 4K television is because it is an area that’s still in its infancy. No standards currently exist that define exactly what audio for 4K is, but organizations like ATSC and SMPTE are now tackling the technical requirements while Dolby, DTS, Fraunhofer, and other manufacturers are sorting out the creation and consumer ends of the chain. The part that is crystal clear is that the efforts of all the standards organizations and manufacturers are aimed at making the 4K audio experience both immersive and personalized, and that these goals are centered around object-based audio.

Object-based (also called object-oriented) audio has been used in video games for years, but it made significant media industry news with the introduction of Dolby Atmos. This format, developed for cinema but destined for homes, was designed to create a completely enveloping audio experience for the filmgoer by placing speakers in the ceiling in addition to the front, side, and rear walls. The base 9.1 channel surround bed covers the front of the cinema while 118 individual audio objects are placed or panned everywhere else in the room. The audio objects can consist of fully mixed audio channels but are more likely to be individual or specifically grouped sounds, depending on what needs to be heard in any of the (up to) 64 addressable speakers at any given point in the film. While this creates a memorable experience in the theater, the move is now underway to bring this immersive audio experience to the home listener by adopting and implementing the NHK-developed 22.2 surround platform.

Home speaker placement for a potential “Immersive audio” experience Delivering 24 channels of audio to the home could prove to be a challenge, but not nearly as insurmountable a challenge as putting 24 speakers in a living room. The recently released HDMI 2.0 specification may solve the consumer end of the delivery chain with its support of 32 channels of audio, along with the specification for enough bandwidth in the pipe that each of those channels can pass audio at 48 kHz. Speaker-count, however, is something that all of the groups pushing immersive audio recognize to be a problem, so they’ve put forth a variety of solutions to achieve the same result without turning living rooms into speaker demo rooms. Those solutions run the gamut, from hanging speakers at ceiling height in the room’s corners, to sound bars under and sound frames around television screens, to surround headphones for personal listening. Oddly, after all of the effort that goes into making the audio immersive the end result is still expected to be downmixed to be compatible for playback on 5.1, stereo, and even mono systems.

Personalization may actually be the most interesting, and most likely to deliver, promise of audio objects for television. Because an object can be any piece of audio in the program, the home listener can have as much control of the program audio as the content creator allows. Turning announce channels into audio objects means that listeners can swap the main broadcast announcers for local language or home team announcers without changing anything else in the mix. The listener can raise voice track volume over background sounds if they are having trouble distinguishing the voice, or mute the voices if they want to experience the event without commentary. Descriptive video service or special commentary programs could be delivered easily using this same method. Of course all of this customizability requires new equipment for content creators and consumers as well as some method of control, most likely to be extended metadata, which will require proper authoring and delivery to the home for the system to function.

The Traumpalast in Backnang, Germany contracted with Kinoton to install a Dolby Atmos audio system earlier this year. The system includes a total of 57 speakers. The introduction of audio objects into the production process certainly means workflow changes, with additional audio sources and new metadata to manage, but that’s not the only area where their impact will be felt. Audio rooms will need to have the physical space to hold the additional speakers required by this format and also for proper imaging from so many sound sources. This means that broadcasters and production houses will need to allot more facility space for audio rooms at a time when many audio production room footprints are shrinking. Post-production systems may require more powerful workstations, with additional DSP and output busses in order to process and deliver the large number of objects that a production may produce. We may even see the return of audio consoles to television audio post rooms just to handle the complex routing and panning of audio objects.

Creating immersion during live event production will require a completely reimagined workflow and could mean that, due to the physical limitations of mobile unit mix spaces, the onsite mixer will send a surround mix plus other audio sources to a production facility with better listening environments where the final immersive mix is actually assembled. To get all of these channels from site to production facility will likely double the number of transmission paths needed. For in-studio show productions it may be difficult to create an immersive environment without the placement of additional microphones or the introduction of pre-produced atmospheres into the program.

With television industry heavyweights pushing immersion and personalization to the forefront, audio objects seem like a technology we’ll all be getting to know better over the next few years, but how widely and to what extent it gets implemented will depend on how far creators and broadcasters are willing to go to produce content with this new technology. It also remains to be seen whether consumers will embrace immersion or personalization of the audio that comes along with their video. That decision may ultimately hinge on whether existing problems and complexities are solved or made worse by this new technology.

Jay Yeary is an audio engineer currently working with the engineering department of a large media company. He finds the concept of audio objects fascinating and dreams of some day panning sounds around 64 speakers, just for fun. He can be reached via TV Technology.