Audio Over IP: How to Overcome Its Complexities in an IP-Based Environment

In recent years, during the move to IP in broadcast production, broadcasters have predominately focused on video transport due to it requiring so much bandwidth. However, audio also presents challenges. Compared to video, not only does audio involve a substantially greater number of flows, but it also uses a diverse number of standards in production.

Professional audio over Ethernet has been around since before the turn of the century, with broadcast radio being an early adopter of standardized network technology. Over the years, several competing proprietary approaches and standards for audio over IP have emerged, including DANTE, REVENA MADI (AES10) and AES67. However, the compatibility between the various approaches and even between implementations of specific formats has been a long-standing issue in audio transport and processing. With an increasing number of broadcasters moving to IP in the facilities, this issue of ensuring audio compatibility has become critical.

As broadcasters look to overcome the complexities presented by audio in the move to IP, they must consider the following key issues:


The streaming plane refers to the basic transport of the audio over the network. In that context, AES67 has become key. First issued in 2013, the AES67 standard has been adopted and integrated by most manufacturers, including providers of products based on proprietary approaches. Crucially, AES67 is also the basis of the recent SMPTE ST 2110-30 standard, which means that compatibility on the streaming plane between most of the popular solutions is largely assured.

That said, within the SMPTE ST 2110-30 standard, three levels of conformance are defined—only some of which are currently supported by vendors. The mandatory Level A provides support for 48 kHz streams with one to eight audio channels, at packet times of 1 ms. Level B adds support for packet times of 125 µs. Level C increases the max number of audio channels allowed per stream to 64. The latter means that MADI, which continues to enjoy a lot of popularity, may be carried as-is over the audio network.

What broadcasters often fail to note as they consider their move to IP is that many audio-over-IP systems are currently only able to handle the basic level A. They may also have limitations when it comes to the total number of audio network streams supported, and what combinations of channel count and stream count can be used. So, while the manufacturers can genuinely claim support for SMPTE ST 2110-30, the limited scope of their compliance should be taken into careful consideration when selecting audio equipment as they could place a restriction on the flexibility of the overall workflow.


As part of implementing AES67 compatibility, Precision Time Protocol (PTP) version 2, or IEEE 1588-2008, can now be used for timing of the network by the different manufacturers. This also fits with the SMPTE ST 2110-10 standard that mandates use of PTP v2. SMPTE has also published the ST-2059 standard, which generalizes the media clock concept of AES67 to any kind of periodic media clock, including video and timecode.


The common production environment has many more audio sources than video and an even greater number of destinations. A major sports production could have thousands of audio channels travelling across the network, for example. So, while audio may not necessarily place high demands on bandwidth in an IP network compared to video, it certainly creates a challenge in terms of control and orchestration.

Audio engineers expect to be able to “plug and play” equipment and connect sources and destinations without concerns about protocols and standards. On the other hand, in a broadcast facility, inter-studio routing must be centrally controlled both for the integrity of signals, but also for security and access control.

The apparent strength of some of the proprietary approaches is that they include a comprehensive control plane, whereas standards like AES67 or indeed SMPTE ST-2110 do not define how the streams should be controlled.


While the proprietary control planes are effective on their own, they are not compatible with each other. More crucially, they are designed for a local studio environment (LAN) and therefore aren’t suited to a seamless distributed production environment, such as for big campus or inter-campus use, or for remote production (over WAN).

Furthermore, these control planes rely on audio being made seamlessly available to any equipment in the network by default, meaning no explicit routing of streams is required. This could be a security concern, especially in a distributed, multidepartment or multi-organization environment.

In addition, the fundamental assumption behind this approach that no controlled bandwidth management is needed may be flawed when the size and complexity of the network increases.


One approach to overcoming the issues with control plane interoperability, and address security and stability concerns, is to bridge different IP audio “islands” by using MADI baseband tielines. However, this adds complexity to the management of audio routing in the campus and reduces flexibility and agility. Essentially, this approach largely defeats the purpose and promise of using a converged media network in the first place.


The Networked Media Open Specifications (NMOS), developed by the Advanced Media Workflow Association (AMWA), offers a way to address endpoint control for audio in a way that may deliver the true promise of distributed IP production. The standard is now gaining traction in the industry—although its uptake among audio equipment manufacturers is lagging behind that of video equipment vendors.


In the meantime, the most promising approach to control audio flows in an IP network is to use software defined networking (SDN). This not only provides an easy way to connect diverse sources and destinations, but it also adds a layer of predictability, performance guarantees and security by managing bandwidth and only allowing authorized destinations access to specific audio network flows.


As production transitions from the LAN environment and into the WAN, and IP audio networking is converged with video networking, audio signal protection is becoming an issue. The SMPTE ST 2022-7 dual path protection standard has now been extended beyond video—to cover any RTP media stream—and provides a great way to ensure audio signal reliability. There may still be compatibility and network addressing issues where different parties need to exchange audio signals, e.g. between different organizations, or simply between an OB van and the live audio system. Broadcasters can address these concerns through IP Media Edge devices and/or SDN controlling which flows can cross the boundary and how—a better approach than using MADI tielines to bridge the gap.


The move to audio over IP will make it easier for broadcasters to keep up with the latest developments. For example, immersive audio authoring typically requires up to 127 dedicated audio object channels, in addition to a base surround signal containing up to 22+2 audio channels. Traditionally MADI has been used to interface this high channel count in the production stage, but IP networked audio has higher capacity and can fit all these channels in one cable. Audio over IP is also more flexible with regards to routing and does not require any expensive and dedicated MADI routers when more complex topologies than point-to-point links are required. The use of audio over IP means there is less need for dedicated or custom hardware, allowing for virtualized and flexible workflows. With increased capabilities to adopt new trends such as immersive audio, broadcasters will be able to offer an enhanced viewing experience to audiences.

Olivier Suard is the vice president of Marketing for Nevion.