Digital Audio Networking

In the ‘80s, organizations realized that computer networks improve productivity. In the ‘90s, the same became true for the public at large. The emergence of computer networks has allowed organizations and individuals to communicate and share information with unprecedented ease and speed.

Calrec Audio’s Alpha 100 digital console, shown here in Studio 3A at NBC Television Network Operations in New York, is the largest of the three Calrec models that can be networked through the Hydra networking system.

This fact is not lost on those who design and operate production facilities with multiple studios and control rooms. For years, mixing consoles have shared sound sources using distribution amplifiers and miles of expensive cables that offered reliable, if rudimentary, connectivity.

More recently, digital audio routers have allowed more elegant audio networking, with greater capacity and computer control. Many console manufacturers believe the next logical step is to make the network part of the mixing console, so that sharing inputs and outputs across any number of mixing consoles becomes an easy and natural process — no more complicated than plugging a microphone into a stand-alone console, and just as reliable. Figure 1 shows a block diagram of a practical audio network.

Using Ethernet

You can simplify the task of building a data network by designing it around one the many existing communication standards. Gigabit Ethernet is a natural choice because it provides the best balance between bandwidth and cost-effectiveness. Inexpensive chip sets are available for making network interfaces, and low-cost components are available for constructing the network fabric.

Figure 1. An example of a number of mixing consoles and remote I/O units connected, in star formation, around a gigabit switch. Click here to see an enlarged diagram.

The Ethernet standard defines a frame of information comprising a header (containing source and destination addresses) and a variable-size data payload (see Figure 1). There are no restrictions on the nature of the payload, the frequency of the packets, or on the time they take to reach their destination.

Ethernet networks are sometimes compared to the postal service — you drop a letter in at one end and, sometime later, it emerges at the destination address. You can’t always predict how the letter gets there and, more crucially, how long it takes. But this unpredictability is not intrinsic to Ethernet — it is due more to the way that higher protocols deal with the dynamic loads of large networks.

To use Ethernet for communicating real-time audio information, you must either eliminate the causes of unpredictable behavior or mitigate them with buffering and retransmission strategies. The nature of audio networking and its context directs us very firmly to the former approach. More on this later, but first some details on how Ethernet can move audio around.


Any scheme for moving digital audio over a packet-based network must pack audio data into a frame, transmit it and then unpack it into its original form. But a packetization strategy involves a number of trade-offs. To make best use of the bandwidth, you must maximize the ratio of payload data to header data. You can accomplish this by using the largest possible payload of 1500 bytes. However, a single channel’s worth of 32-bit audio data packed into such a frame would contain 8 milliseconds of material. This introduces a granularity that, given the inevitability of buffering, possibly at multiple points in the transmission chain, would impose a delay on the audio path amounting to many tens of milliseconds. This may be acceptable in some applications, but certainly not in live broadcast.

Figure 2. Maximizing payload by keeping protocol overhead to a minimum. Any Ethernet layer-2 switch can handle this frame. Click here to see an enlarged diagram.

Payload, granularity and latency

When you network mixing consoles, it is likely that groups of signals, rather than individual signals, will route together. This helps ease the packetization problem, since multiple audio signals can group into the same packet, which tends to maximize the payload data while minimizing the audio granularity and hence the latency. The network attempts to balance these factors by using a variable frame size. (see Figure 2). Such a frame can accommodate eight 32-bit samples of anything from one to 44 audio channels, depending on network demand. At 48kHz sample rate, this sets the audio granularity to around 160 microseconds. The console-to-console network latency is equal to two lots of frame buffering delays (one at the transmit end and one at the receive end) amounting to 360 microseconds, plus delays from the network-interface circuit to the console audio backplane. Experimental measurements have shown that the total latency in each direction is approximately 460 microseconds (see Figure 3).

Figure 3. Low network latency is crucial when a performer is listening to a mix that includes his own voice because the signal makes two trips across the network as part of a foldback mix. The total delay is less than 1ms. Click here to see an enlarged diagram.

Guaranteed on-time delivery

Having established a mechanism for the efficient movement of audio through the Ethernet fabric, you need to ensure that it will deliver every packet, without fail, under all loading conditions. This brings up the need for deterministic network performance. Those familiar with the origins of Ethernet understand that it was designed to be anything but deterministic. In early Ethernet-based computer networks, multiple nodes connected to a single cable segment. If two nodes attempted to transmit simultaneously, the network detected the collision and both nodes backed off for a random amount of time before trying again. If a segment were particularly busy, it could delay frame transmission many times. This resulted in unpredictable link performance. A real-time digital audio application that is intolerant to delays and relies on the timely reception of all frames requires much better performance.

Star-shaped topology

Since Ethernet’s inception in the early ‘80s, things have improved greatly: link speeds have increased from 10MB/s, to 100MB/s, 1GB/s and even 10GB/s. But the problem of predictable behavior has been affected most significantly by the development of cost-effective switches. These devices have allowed network topology to change from shared segments to star-shaped, with a single node on each point of the star. The switch routes traffic directly from source to destination without affecting any other nodes. In non-blocking switches, the hardware can continuously receive data at one port and route it to another at the maximum data rate, irrespective of what traffic other ports are handling. This kind of network provides a suitable fabric with which to build an audio network. To guarantee fully deterministic performance, it is necessary to apply the further restriction that we keep the network private. This means that we must not make it carry any data other than that generated by the audio network.


A useful consequence of designing the network to have deterministic behavior is that it can disregard much of the protocol baggage of standard networks. By abandoning the TCP/IP transport and network layers, we remove the need for 24 extra header bytes, leaving a data payload from 46 to 1500 bytes. Given a 32-bit audio word size and an efficient packing strategy, it follows that it should be possible for a single Gigabit Ethernet connection to transport in excess of 600 channels of 48kHz digital audio. In practice, a somewhat lower maximum load is prudent to allow for some non-audio communication. The Gigabit Ethernet network has successfully carried loads of 585 audio channels in the presence of heavy control (non-audio) traffic.


An audio network differs from typical computer LANs in that there is no central server. In fact, the relationship between network devices is peer-to-peer, although each mixing console has characteristics of both client and server because it is capable of both requesting and responding to requests for audio data. This arrangement has a useful resilience because no single part of the network is essential for control of the network. The network system has been designed for devices to be added or removed (or even to stop working) without affecting anything else. The control software on each mixing console uses the network to constantly audit connected resources, so every device is aware of every other device and the connections it offers. If resources change, the rest of the network becomes aware, very quickly. This information may be used to automatically switch to alternative hardware, should a redundant system be available, as well as to update the list of available resources.

For a network to be truly useful, it must be easy to use and maintain. This requires control software that constantly monitors the network and performs essential administration functions, leaving users free to creatively exploit network resources as easily as if they were locally connected.

For reliability, the network should use proven high-quality switch hardware with redundant power supplies. For belt-and-braces reliability, the network designer may duplicate some or all of the network fabric (cabling and switch hardware) and the network-interface hardware. That way, if the network software detects any failures, it deploys this redundancy automatically.

Photo 1. The front panel of the Hydra interface box, which interfaces the mixing console to the network.

A practical audio network

Calrec Audio has developed a technology based on Gigabit Ethernet for constructing large-scale, low-cost, real-time networks designed for connecting mixing consoles and remote I/O sources. The Hydra audio network allows production facilities to connect any number of digital consoles to share input sources, buss outputs, direct outputs and clean feeds. In fact, you can share every signal on a console with any other console on the network. In addition, you can connect remote I/O units, with up to 96 inputs and 96 outputs, analog or digital, to the same network. This provides remotely located sources and destinations that any or all mixing consoles can use. Photo 1 is a photo of the front panel of the Hydra gigabit console interface.

Photo 2. A remote I/O box with ADCs, DACs and AES I/O modules.

The model shown provides two separate duplex gigabit ports that can be used as a pair in which one port functions as a “hot” redundant spare to the other, or in parallel to double the interface bandwidth. Photo 2 is a photo of the remote I/O box and its network connections. The independent I/O unit, which may be located remotely, provides general purpose I/O to the network. It also has two gigabit ports intended to provide an option for connecting redundant switch fabric.

The audio network is built on Gigabit Ethernet technology (IEEE Std 802.3-1998) and the network fabric is made using low-cost, off-the-shelf hardware. The network topology is similar to an office LAN, using a central switch with connections to each mixing console in a star formation. You can make connections up to 90 meters with Category 5e UTP, or up to several kilometers with optical fiber.

Patrick Warrington is technical director of Calrec Audio.

Home | Back to the top | Write us