Mary Gruszka /
08.04.2014 05:00 AM
For AES67, Timing is Everything
Audio-over-IP standard seeks to lessen latency
NEW YORK—A brilliant decision in developing the recent audio-over-IP standard, AES67-2013, was basing it on existing protocols.

One of the goals for this “AES standard for audio applications of networks—High-performance streaming audio-over-IP interoperability” was to provide low latency (less than 10 milliseconds) between and among AES67 devices on a network. With that in mind, the developers of this standard chose IEEE 1588. The most recent version is from 2008, and the standard is currently being reviewed for possible revisions.

“IEEE 1588 is used for high-speed synchronization over the network. It was the only obvious choice for establishing a synchronization mechanism” said Greg Shay, a member of the AES task group that developed the standard.

Industrial applications such as automation, control and measurement employ IEEE 1588 where precision timing in the nanosecond range or less is required. IEEE 1588 doesn’t automatically dictate the accuracy of the clock—rather it allows implementation to be as tight as required.

IEEE 1588 differs from network time protocol (NTP) in that the latter was not designed to have nanoseconds of precision.

Timing in video or digital audio systems is not a new concept. A video system relies on a master sync generator that produces black burst that’s distributed and hardwired to every video and digital audio device in the system. For standalone digital audio systems, it’s a word clock generator that provides the reference. In each of these cases, a separate cable is generally used for the timing reference.

IEEE 1588 system timing also relies upon clocks, as might be expected, but the clocks work a bit differently. Every AES67 device that needs to communicate with each other (such as audio consoles, codecs, processors, interface boxes) is connected to a network via Ethernet switches. It’s important to note that the network we’re talking about for AES67 isn’t the public Internet, but rather a managed network like an enterprise LAN or WAN. And to achieve its low-latency potential, professional- grade (high performance) Ethernet switches are necessary. (More on switches in a bit.)

Each device connected to the network has its own clock. One clock that is used for reference is designated the “grandmaster.” IEEE 1588 provides a means for the network to automatically designate a clock as a grandmaster based on its stability, accuracy and quality, among other criteria. For example, a clock referenced to GPS would be a likely candidate for grandmaster status.

While the grandmaster reference can, in theory, be anywhere on a network, Shay said it’s best if it’s located near other devices on the network (such as those within a facility) to avoid long IP connections.

The grandmaster transmits over the network a small packet of data, the PTP message, with a time stamp of the current time. This multicasted message is received by all the other devices on the network, the “slaves,” which synchronize to the reference. Or thinking in video systems terms, the devices are effectively “gen-locked” to the reference.

Unlike in a video system, the IEEE 1588 timing reference doesn’t travel on separate cables. Rather it is transmitted on the same network as the digital audio packets. However the timing and audio packets are completely separate. When the PTP message (clock data) is received by a device on the network, it runs an algorithm to adjust its local clock to synchronize with the grandmaster clock.

When the grandmaster sends its PTP messages, it takes a finite amount of time to reach all the slave devices on the IP network. To provide greater timing accuracy, IEEE 1588 provides a means to measure the roundtrip time from grandmaster to slave and back to grandmaster, and to compensate for this offset.

In addition to this roundtrip delay, there are also delays through the Ethernet switches themselves as they buffer and transmit data packets. If a switch could provide a measurement of how long it takes a packet to get through it, that information could also be used to fine-tune timing. The 2008 revision of IEEE 1588 provides for what is called a “transparent clock” that does just that.

A switch with a transparent clock, “tries to correct the time stamp for the time packet,” Shay, who is also chief science officer for Telos Alliance, said. “This occurs in the switch itself and reduces the error at each stage of the process.”

A problem for an AES67 system design is that switches with transparent clocks aren’t that common, at least not right now. “There are some IEEE 1588-aware switches.” Shay said. “These switches are not yet ubiquitous, but they are becoming more available. [The system] works better with the special switch, but you don’t have to use them. There’s less error with an IEEE 1588 switch, but you can achieve a certain level of accuracy without [it].”

In addition to the types of clocks already mentioned, IEEE 1588 describes yet another type, a “boundary clock,” which slaves to a grandmaster and, acting like a sub-master, retransmits timing data to slave devices down the line.

Audio over IP is multichannel in the sense that a source connected in some way to the network can be made available to every other destination device. No more need for audio distribution amplifiers to send shared feeds to different studios.

Shay said that a 1 gigabit Ethernet connection can carry 250 stereo audio channels. Of course redundant cables would be good systems design.

“This is the point where MADI is way obsolete,” he said. “AoIP has many times more capacity than MADI with a smaller cable connection.”

What is AES67?

From the Audio Engineering Society’s Publication of the AES67 Standard:

“High-performance media networks support professional quality audio (16 bit, 44,1 kHz and higher) with low latencies (less than 10 milliseconds) compatible with live sound reinforcement. The level of network performance required to meet these requirements is available on local-area networks and is achievable on enterprise-scale networks.

A number of networked audio systems have been developed to support high-performance media networking but until now there were no recommendations for operating these systems in an interoperable manner. This standard provides comprehensive interoperability recommendations in the areas of synchronization, media clock identification, network transport, encoding and streaming, session description and connection management.”

For more information, visit

Another plus for AoIP is that it can, through an IP network, tie together studios or facilities that are remotely separated, such as studios in two different cities, or remote trucks to studio, or even truck to truck or truck to announce booth. Not only outputs from one location could feed others, but sources could also be made accessible as well. But how would these systems be synced together?

Shay suggested that each location have a GPS receiver connected to the network to obtain a global time base. He noted that while there are GPS receivers with built-in IEEE 1588, they are not nearly as common as GPS receivers with NTP. So until GPS with IEEE 1588 becomes more available, one way around this situation is to sync an IEEE 1588 grandmaster to NTP that is synced to GPS.

Precision time synchronization as well as low packet size for the audio payload has allowed AES67 to achieve its target audio latency for AoIP.

Considering the potential of AES67 as a whole, “we are coming to a crossroads,” Shay said. “When the transformation is complete, there will be no more XLR devices.”

Thanks to Greg Shay who provided background information for this article.

Mary C. Gruszka is a systems design engineer, project manager, consultant and writer based in the New York metro area. She can be reached via TV Technology.

Post New Comment
If you are already a member, or would like to receive email alerts as new comments are
made, please login or register.

Enter the code shown above:

(Note: If you cannot read the numbers in the above
image, reload the page to generate a new one.)

Posted by: Anonymous
Mon, 08-04-2014 05:26 PM Report Comment
This article is very misleading. The author talks about AES67 as if it is a new transport protocol, equivalent to Dante or AVB. It's not. It's just an interoperability protocol. AES67 is an interface that would allow an AVB product to talk to a Dante product, for instance. It's actually rather mundane and boring, and will probably become quickly obsolete once the industry chooses either AVB or Dante as the de facto standard (which will eventually happen, since they are both almost identical to one another, and there is no reason to support two protocols that do the same thing). Secondly, 10ms is not a very low latency transmission. CobraNet could do better than that on 100Mbit networks, 20 years ago. Modern protocols that work on gigabit networks usually have no more than 2ms of latency, and sometimes are quite lower than that. Thirdly, it's no surprise that AES67 chose IEEE 1588 as its clocking protocol, since nearly every other modern Audio Over Ethernet protocol uses it as well. It's rather ridiculous to describe the decision to use IEEE 1588 as "brilliant". It would be more accurate to say that it was a no-brainer, since there was virtually no other option available. For a protocol whose only purpose is to allow other protocols to talk to each other, why would it use a different clocking mechanism than the protocols it is designed to work with? Finally, the whole GPS thing is borderline insane. Suppose you had two buildings that are 50 miles apart, with a dedicated network connecting them. Why on Earth would it be important for the audio playing in Building 1 to be nanosecond-synchronized to the audio playing in Building 2, to the extent that you're going to use an elaborate GPS system to ensure that synchronization? If the people in Building 2 hear the audio as much as a full second later than Building 1, there will be virtually no consequences. The only reason you'd need such tight synchronization is for ensuring that different loudspeakers in the same acoustic space remain synchronized. That notwithstanding, why wouldn't PTP work over the dedicated network between the two buildings? Just because the distance is greater doesn't mean that PTP won't work. Since none of these protocols are designed to work over the Internet (or even over a single network router), a dedicated network connection between the two buildings should be stable enough for PTP.
Posted by: Anonymous
Wed, 08-06-2014 10:26 AM Report Comment
The GPS clock is about synchronization of devices (like workdclock in a AES3 infrastructure). This is also not about simply going to another building, it can be about crossing the Atlantic (like for the London 2012 Olympics for example. In this case (as with all cases), both sides of the Atlantic needed to be able to understand packet order and latencies between various devices. As running streams across IT networks involves packets taking any number of paths. This is not simply about delay between output nodes, it is about latency within the network and compensation to ensure that all the samples play out of each device in the correct order. And about the 10ms latency. AES67 networks provide a much lower true latency can be just a few samples... but this is dependant on the size of the network and how many pieces of IT equipment the stream is passing through.

Thursday 11:07 AM
The Best Deconstruction of a 4K Shoot You'll Ever Read
With higher resolutions and larger HD screens, wide shots using very wide lenses can be a problem because they allow viewers to see that infinity doesn’t quite resolve into perfect sharpness.

Featured Articles
Discover TV Technology