For AES67, Timing is Everything

NEW YORK—A brilliant decision in developing the recent audio-over-IP standard, AES67-2013, was basing it on existing protocols.

One of the goals for this “AES standard for audio applications of networks—High-performance streaming audio-over-IP interoperability” was to provide low latency (less than 10 milliseconds) between and among AES67 devices on a network. With that in mind, the developers of this standard chose IEEE 1588. The most recent version is from 2008, and the standard is currently being reviewed for possible revisions.

“IEEE 1588 is used for high-speed synchronization over the network. It was the only obvious choice for establishing a synchronization mechanism” said Greg Shay, a member of the AES task group that developed the standard.

Industrial applications such as automation, control and measurement employ IEEE 1588 where precision timing in the nanosecond range or less is required. IEEE 1588 doesn’t automatically dictate the accuracy of the clock—rather it allows implementation to be as tight as required.

IEEE 1588 differs from network time protocol (NTP) in that the latter was not designed to have nanoseconds of precision.

Timing in video or digital audio systems is not a new concept. A video system relies on a master sync generator that produces black burst that’s distributed and hardwired to every video and digital audio device in the system. For standalone digital audio systems, it’s a word clock generator that provides the reference. In each of these cases, a separate cable is generally used for the timing reference.

IEEE 1588 system timing also relies upon clocks, as might be expected, but the clocks work a bit differently. Every AES67 device that needs to communicate with each other (such as audio consoles, codecs, processors, interface boxes) is connected to a network via Ethernet switches. It’s important to note that the network we’re talking about for AES67 isn’t the public Internet, but rather a managed network like an enterprise LAN or WAN. And to achieve its low-latency potential, professional- grade (high performance) Ethernet switches are necessary. (More on switches in a bit.)

Each device connected to the network has its own clock. One clock that is used for reference is designated the “grandmaster.” IEEE 1588 provides a means for the network to automatically designate a clock as a grandmaster based on its stability, accuracy and quality, among other criteria. For example, a clock referenced to GPS would be a likely candidate for grandmaster status.

While the grandmaster reference can, in theory, be anywhere on a network, Shay said it’s best if it’s located near other devices on the network (such as those within a facility) to avoid long IP connections.

The grandmaster transmits over the network a small packet of data, the PTP message, with a time stamp of the current time. This multicasted message is received by all the other devices on the network, the “slaves,” which synchronize to the reference. Or thinking in video systems terms, the devices are effectively “gen-locked” to the reference.

Unlike in a video system, the IEEE 1588 timing reference doesn’t travel on separate cables. Rather it is transmitted on the same network as the digital audio packets. However the timing and audio packets are completely separate. When the PTP message (clock data) is received by a device on the network, it runs an algorithm to adjust its local clock to synchronize with the grandmaster clock.

When the grandmaster sends its PTP messages, it takes a finite amount of time to reach all the slave devices on the IP network. To provide greater timing accuracy, IEEE 1588 provides a means to measure the roundtrip time from grandmaster to slave and back to grandmaster, and to compensate for this offset.

In addition to this roundtrip delay, there are also delays through the Ethernet switches themselves as they buffer and transmit data packets. If a switch could provide a measurement of how long it takes a packet to get through it, that information could also be used to fine-tune timing. The 2008 revision of IEEE 1588 provides for what is called a “transparent clock” that does just that.

A switch with a transparent clock, “tries to correct the time stamp for the time packet,” Shay, who is also chief science officer for Telos Alliance, said. “This occurs in the switch itself and reduces the error at each stage of the process.”

A problem for an AES67 system design is that switches with transparent clocks aren’t that common, at least not right now. “There are some IEEE 1588-aware switches.” Shay said. “These switches are not yet ubiquitous, but they are becoming more available. [The system] works better with the special switch, but you don’t have to use them. There’s less error with an IEEE 1588 switch, but you can achieve a certain level of accuracy without [it].”

In addition to the types of clocks already mentioned, IEEE 1588 describes yet another type, a “boundary clock,” which slaves to a grandmaster and, acting like a sub-master, retransmits timing data to slave devices down the line.

Audio over IP is multichannel in the sense that a source connected in some way to the network can be made available to every other destination device. No more need for audio distribution amplifiers to send shared feeds to different studios.

Shay said that a 1 gigabit Ethernet connection can carry 250 stereo audio channels. Of course redundant cables would be good systems design.

“This is the point where MADI is way obsolete,” he said. “AoIP has many times more capacity than MADI with a smaller cable connection.”

What is AES67?

From the Audio Engineering Society’s Publication of the AES67 Standard:

“High-performance media networks support professional quality audio (16 bit, 44,1 kHz and higher) with low latencies (less than 10 milliseconds) compatible with live sound reinforcement. The level of network performance required to meet these requirements is available on local-area networks and is achievable on enterprise-scale networks.

A number of networked audio systems have been developed to support high-performance media networking but until now there were no recommendations for operating these systems in an interoperable manner. This standard provides comprehensive interoperability recommendations in the areas of synchronization, media clock identification, network transport, encoding and streaming, session description and connection management.”

For more information, visit

Another plus for AoIP is that it can, through an IP network, tie together studios or facilities that are remotely separated, such as studios in two different cities, or remote trucks to studio, or even truck to truck or truck to announce booth. Not only outputs from one location could feed others, but sources could also be made accessible as well. But how would these systems be synced together?

Shay suggested that each location have a GPS receiver connected to the network to obtain a global time base. He noted that while there are GPS receivers with built-in IEEE 1588, they are not nearly as common as GPS receivers with NTP. So until GPS with IEEE 1588 becomes more available, one way around this situation is to sync an IEEE 1588 grandmaster to NTP that is synced to GPS.

Precision time synchronization as well as low packet size for the audio payload has allowed AES67 to achieve its target audio latency for AoIP.

Considering the potential of AES67 as a whole, “we are coming to a crossroads,” Shay said. “When the transformation is complete, there will be no more XLR devices.”

Thanks to Greg Shay who provided background information for this article.

Mary C. Gruszka is a systems design engineer, project manager, consultant and writer based in the New York metro area. She can be reached via TV Technology.