Next-Gen TV Promises Immersive, Personalized Audio

PRINCETON JUNCTION, N.J.—When we look at where television audio standards stand today, it’s hard to reconcile the initial mono sound broadcasts from television’s infancy to where it has evolved with the capabilities of the next generation of broadcast standards, part of ATSC 3.0. This progression of television audio from mono led first to stereo, then a second audio language program (SAP), to ATSC 1.0 digital with the availability of Dolby AC3 Surround Sound—each step advancing audio technology on a steady progression towards making sound more realistic and engaging.

Visitors to the Next Generation TV Hub at the 2017 NAB Show check out immersive audio for ATSC 3.0.
Now, however, new doors are being opened for content developers to leverage the ATSC 3.0 next-generation audio standards and offer a more immersive sound environment, as well as provide end users with a more personalized sound experience. An exploration of ATSC 3.0 audio standards capabilities demonstrates their impact on how we deploy and engage with new sound systems that are evolving along with rapid advancements in the digital world.

Building upon surround sound that’s laid out in a plane (5.1)—such as with initial surround sound systems like Dolby AC3—ATSC 3.0 audio standards take sound to a full 7.1+4 implementation, meaning seven channels of sound in a plane, one channel for a subwoofer (or the low frequencies), and four channels overhead.

On first look, this may just seem like throwing more sound into the mix, but how that sound is delivered is what makes it so unique. With audio experts having a detailed understanding of how the ear works and how humans perceive sound, the new standards can be used more effectively to convey directionality. And, what’s more, this can be done not just on fully equipped home theater speaker systems leveraging all channels, but on something as simple as a sound bar attached to a digital TV. It’s even possible to replicate this immersive 3D sound environment using ordinary headphones.

Imagine the sound of raindrops hitting leaves over your head in a scene filmed in a tropical forest or the oncoming sounds of a helicopter approaching from the side and crossing overhead before moving on and away from you. The possibilities for sound technicians truly are expansive and these new ATSC 3.0 standards involved are designed to scale and accommodate newer, more sophisticated audio scenarios as they emerge, making for a truly immersive 3D, and much more attractive, user experience today and for the future.

While there’s likely to be some time delay related to broadcasters implementing full capability of the new standards, (as well as end users not running out to purchase advanced sound systems), there are other aspects of the new audio technologies that are going to probably be used right away, and that will ultimately be very impactful.

For example, the coding technology for next-generation audio systems has moved away from being simply channel-based systems. In today's 5.1 implementations, there are five channels of surround sound and one channel for subwoofer, or low frequency, with fixed assignments: front left, front right, center, the two rears, and then the subwoofer. All sounds fall into these channels. Next-generation audio standards additionally incorporate object audio, whereby audio objects can move and be maneuvered into different positions to register sound information.

As an example, imagine someone filming a skateboarder while they run a circuit in a skate park, where the sound tech is following the skater using a joystick to control the movement of a sound object in three dimensions. In this scenario, the sound will follow the skateboarder around the course and record a more realistic representation of sound as it changes with the skater’s movements. This allows for more diversity as objects can be positioned and moved to accommodate a lot of unique and intriguing audio scenarios.

Another change enabled by next-generation audio standards—including audio objects instead of just channels—is that it allows viewers to control and choose objects that they want to hear (personalization). This enhances the user experience by vastly increasing a viewer’s control over audio content. For example, because you are dealing with objects, it makes it possible to offer controls that allow viewers to turn one object up, or turn another down, based on their own personal preference. For example, you might broadcast a football game where one object is the home team announcer and the other is the visiting team announcer. With next-generation audio systems, it’s fairly simple to give the viewer control over the audio so that they can choose and customize an audio experience tailored to their individual likes (choose which announcer to listen to in this example). Another scenario might be a visually-challenged viewer looking to “turn up” an object that is providing some audio detail describing what is coming over the television screen (known as descriptive video). Personalized audio is likely to be simpler to implement than full immersive audio (especially regarding fitting into existing station workflows) and will be very attractive to many viewers.

Technology is always evolving, as are the capabilities of devices. Because it’s understood that evolution will happen, ATSC 3.0 has been developed to gracefully move from what we have today to what will be coming in the future. The need for this is something learned from past experience and incorporated into the entire standard, not just the audio portion. Throughout the entire system, each layer signals to the layer above what technologies will be used. ATSC 3.0 has set the stage for carrying both the old technology and new technology as it comes online—a win-win scenario for all involved in broadcast television and the viewing public at large.

In my opinion, one of the primary themes of this years NAB was ATSC 3.0. ATSC 3.0 is clearly a reality—with demos, products, conference sessions and significant mention in the keynote speeches from FCC commissioner Ajit Pai, NAB’s Senator Gordon Smith and Sam Matheny. There was a well-attended Next Generation TV Hub in the LVCC Grand Hallway that demonstrated the reality of many new features of the ATSC 3.0 system: Better Pictures, Immersive Sound, Mobility, Gateway Devices, Targeted Ad Insertion, Audience Measurement, Emergency Alerting, Content Delivery to Automobiles and a broadcast from Black Mountain. The ATSC Pavilion in the Futures Park area of North Hall gave a deeper dive into many technologies and features for ATSC 3.0—including the systems currently being deployed in South Korea for the launch of UHDTV services for the 2018 Olympics.

Dr. Richard Chernock is the Distinguished Lecturer Chair for the IEEE Broadcast Technology Society (IEEE BTS). He is currently the Chief Science Officer at Triveni Digital. In that position, he is developing strategic directions for monitoring, content distribution and metadata management for emerging digital television systems and infrastructures. Dr. Chernock is active in many of the ATSC, SMPTE and SCTE standards committees, particularly in the areas of future DTV, monitoring, metadata, and data broadcast. He is chairman of the ATSC Technology Group on ATSC 3.0 (TG3) and chairs the AHG on service delivery and synchronization for ATSC 3.0. He was previously chairman of the ATSC Technology and Standards Group (TG1). Previously, he was a Research Staff Member at IBM Research, investigating digital broadcast technologies.

For more on this subject, visit ourATSC 3.0 silo.