Skip to main content

Synchronizing Audio and Video Over Ethernet

Ethernet networks are used throughout modern broadcast facilities for all types of video and data transport except one: real-time play-out of audio video signals for broadcast. For these signals, familiar coax cables carrying ASI or SDI (often with embedded audio) is still the preferred method for transmitting production-quality content within a production facility. This status-quo forces broadcasters to install two sets of infrastructure today—one set consists of Ethernet cabling with switches, routers and device interfaces to handle data communications, file transfers, device control, Internet access and, increasingly often, voice communications. Then, a whole separate infrastructure is installed for video and audio transport, including coaxial cabling, video switches/routers, and ASI and SDI interfaces on equipment.

Part of what is holding broadcasters back from using Ethernet as a universal platform is the unpredictable nature of packet-based networks. Without a way to reserve or allocate bandwidth along a path, it is very difficult to guarantee that signals will flow from their source to their destinations without bottlenecks or delays. And, without a common clock, each receiving device must buffer incoming feeds and re-establish synchronization between different signals. Both of these drawbacks can be overcome with a new set of standards, collectively called IEEE 802.1 Audio Video Bridging (AVB).


Within the IEEE, the 802 organization develops standards for Ethernet, Wi-Fi, routers, switches, and many other technologies used in everyday data communications.

The 802.1 Audio Video Bridging (AVB) Task Group has been working since the middle of the last decade to create standards that significantly improve the ability of Ethernet networks to transport high-quality, uncompressed video and audio streams. There are four main standards that have been released by the AVB group:

Fig. 1: AVB streams can only flow along paths (shown in red) that connect AVB compliant devices.IEEE 802.1BA is the AVB Systems specification that defines the functional requirements for a variety of device types, as well as providing overall performance specifications. For example, the standard defines the maximum latency from a signal source (talker) to a signal destination (listener) is 2 msec for Class A (highest performance) connections, even if multiple hops are involved.

IEEE 802.1Qat Stream Reservation Protocol defines a method for devices to request defined amounts of network bandwidth along a path from a "talker" device to a "listener" device. Once all the devices along the path have granted the request, the bandwidth is reserved for the exclusive use of the requesting application until it is no longer needed. Along with specifying how packets are given priority within the network, this standard also defines mechanisms for dealing with a variety of error conditions.

IEEE 802.1AS Precision Time Protocol specifies a mechanism for distributing a common, highly accurate clock to all of the devices in a network. A common timebase can support functions such as synchronizing audio and video signals, and aligning the phases of audio signals in multichannel sound systems. The clock alignment needs to be very accurate—within 1 microsecond for some professional audio applications.

IEEE 802.1Qav Forwarding and Queuing Enhancements define how the different elements of the Ethernet network should behave in order to make sure the time-sensitive streams are routed quickly and consistently through the network. One area of standardization is traffic shaping, defining how Ethernet frames are metered out of a network interface in a smooth, regular pattern to help receiving devices establish a consistent traffic flow as the frames move across the network.

Fig. 1 shows a sample network with a mixture of devices that support the AVB protocols and units that don't. This diagram uses thick, red connecting lines where AVB traffic can flow, and thin, blue lines where AVB traffic can't be supported. The basic rule is that each endpoint, as well as each of the devices along a path, must support AVB protocols for traffic to flow.


From an applications standpoint, it is important to remember that this technology is limited to hard-wired LANs that use Ethernet. With respect to wireless technology (such as 802.11 Wi-Fi), AVB capabilities have (so far) been limited to precision timing. Going outside the LAN via Internet connections is not feasible today, because the necessary precision timing and bandwidth allocation functions have not yet been built into wide area networking equipment or carrier service offerings, but hopefully will be sometime in the future.

Many of the products currently available or in development are focused on high quality audio transport, either in the form of endpoints or switches. This makes sense considering that a single, low-cost GigE connection can support literally hundreds of high-quality audio streams. In contrast, even a single uncompressed HD video signal would require a 10-Gigabit Ethernet connection, which is still quite expensive by datacom networking standards.

As the price of 10GigE chipsets comes down, it will become more economical to build Ethernet devices and switches that can support 1.5 and 3 Gbps video streams. If these units can be built to support the 802.1AVB standards, it will someday be possible to use a single, converged network for all the signals inside a broadcast studio: data, voice, audio and video.

Thanks to Pete Gilchriest, technology director of A.R.G ElectroDesign for his helpful comments on this article. Wes Simpson can be reached at