Media Over IP Enables Transformation in Audio Production and Experience

(Image credit: Rafael Fonseca)

With the advent of media over IP and the quest for better entertainment experiences, audio has finally taken on the prominent role in broadcast it deserves. You might even call it a next-generation audio revolution.

The industry’s embrace of immersive audio will have a tremendous impact on live broadcasts—and it’s already transforming the music, theater, movie, studio, entertainment music, studio and recording industries.


Extending surround sound, which operates in a horizontal plane, to a three-dimensional audio experience (spatial audio), immersive audio gives creatives and engineers more freedom in conveying 

emotion and more powerful tools in adding perspective to what is being watched or heard. The intent is to immerse the audience from any—if not every—direction.

Immersive audio uses ambisonic signals that capture sound in the audio field from one or several points in space, creating “audio objects” that allow for mix adaptation for different types of sound systems. Enabling manipulation of the sound field or the selection of specific streams, audio objects open the door to creation of unique experiences, both by the production team and by listeners themselves. Flexibility and choice are the key.

With immersive audio, the viewer-listener can select specific audio objects for inclusion in the version of the broadcast they see and hear. Much like an additional camera feed option, such as a “goal cam” or “referee cam,” different audio feeds can be chosen to bring the listener into different parts of a game or other live experience. Viewer-listeners can alter their spatial position and thus gain a different perspective that adds greater realism and a sense of being in the midst of the action. Additional sound might be the feed from a mic’d-up player, or communications between a Formula 1 driver and crew, or just about any other audio element that can be captured during production.


Traditional sound fields today are stereo and surround sound: 5.1 and 7.1, with left, center, right, left surround, right surround and subwoofer/low-frequency effects. Immersive audio introduces two more layers—a height layer and top layer in addition to the surround sound base layer.

Audio objects have associated metadata that describes how the audio is going to be reproduced within the sound field. Describing the position, spread, motion characteristics and other rendering information, this data gives objects the independence essential to flexible placement according to position or via a specific speaker. 

In immersive audio implementations, broadcasters will find more audio flows to contend with and manage. As the format gains a foothold in production workflows, one key challenge will be to scale services, given the much greater potential for complexity than with convention stereo and surround formats.

The number of audio sources, flows and objects is higher with immersive audio than in a standard surround sound workflow, possibly requiring more comprehensive PTP network practices to maintain timing/synchronization integrity across the workflow. Primary considerations for infrastructure will be traffic and bandwidth management, QoS treatment and proper PTP network hierarchy treatment.

Looking at the challenge broadly, one response in terms of switching infrastructure is to integrate with automation, control and orchestration platforms, as these represent the intent of the workflow by means of control. Tackling infrastructure in more specific terms, broadcasters will need to implement PTP-aware managed IP switches that can expose northbound APIs for control and provisioning.

From a networking perspective, broadcasters might want to consider bundling audio sources and objects based on categories, such as location of the source, type of sound being “collected” and specific effects to be captured. With audio flows being managed as objects that can be moved around, it’s helpful when the underlying infrastructure can aggregate data in a way that makes sense to the human brain.

To maintain logical organization of flows and data, it is useful to work with a switching infrastructure that can manage this traffic adeptly, apply groupings by means of VLANs and guarantee that the proper QoS is applied. This is especially important in workflows where other types of traffic (e.g., intercom, metadata) ride the same infrastructure.

Artel’s AoIP, managed and PTP-enabled switches provide the level of traffic control/management and PTP functionality to support immersive workflows. The boundary clock function within the company’s Quarra family of switches provides a tool for sectionalizing the PTP network to achieve better performance. (Smart sectionalizing of the audio sources can prevent PTP grandmaster overload conditions). For live events in particular, broadcasters might consider deploying their networks in terms of which switches serve which sources.


Immersive audio is still in relatively early stages of development, and no standard configuration has yet been adopted by the industry. Broadcasters have performed trials with 5.1.2, 5.1.4 and 7.1.4, with 5.1.4 apparently offering a superior 3D experience while enabling a manageable infrastructure transition. 

Whatever format they choose to adopt, broadcasters working with advanced IP switches can be confident in their ability to address challenges that come with managing more sources and data with efficiency, and with the accuracy essential to real-time media workflows.