A brilliant decision in developing
the recent audio-over-IP standard, AES67-2013,
was basing it on existing protocols.
One of the goals for this “AES standard for audio
applications of networks—High-performance
streaming audio-over-IP interoperability” was to
provide low latency (less than 10 milliseconds)
between and among AES67 devices on a network.
With that in mind, the developers of this standard
chose IEEE 1588. The most recent version
is from 2008, and the standard is currently being
reviewed for possible revisions.
“IEEE 1588 is used for high-speed synchronization
over the network. It was the only obvious
choice for establishing a synchronization mechanism”
said Greg Shay, a member of the AES task
group that developed the standard.
Industrial applications such as automation,
control and measurement employ IEEE 1588
where precision timing in the nanosecond range
or less is required. IEEE 1588 doesn’t automatically
dictate the accuracy of the clock—rather it
allows implementation to be as tight as required.
IEEE 1588 differs from network time protocol (NTP) in that the latter was not designed
to have nanoseconds of precision.
RELYING ON THE CLOCK
Timing in video or digital audio systems
is not a new concept. A video system relies
on a master sync generator that produces
black burst that’s distributed and
hardwired to every video and digital audio
device in the system. For standalone digital
audio systems, it’s a word clock generator
that provides the reference. In each of
these cases, a separate cable is generally
used for the timing reference.
IEEE 1588 system timing also relies
upon clocks, as might be expected, but the
clocks work a bit differently. Every AES67
device that needs to communicate with
each other (such as audio consoles, codecs,
processors, interface boxes) is connected
to a network via Ethernet switches.
It’s important to note that the network
we’re talking about for AES67 isn’t the
public Internet, but rather a managed network
like an enterprise LAN or WAN. And
to achieve its low-latency potential, professional-
grade (high performance) Ethernet switches are necessary. (More on switches
in a bit.)
Each device connected to the network
has its own clock. One clock that is used
for reference is designated the “grandmaster.”
IEEE 1588 provides a means for
the network to automatically designate a
clock as a grandmaster based on its stability,
accuracy and quality, among other criteria. For example, a clock referenced to
GPS would be a likely candidate for grandmaster
While the grandmaster reference can,
in theory, be anywhere on a network, Shay
said it’s best if it’s located near other devices
on the network (such as those within
a facility) to avoid long IP connections.
The grandmaster transmits over the
network a small packet of data, the PTP
message, with a time stamp of the current
time. This multicasted message is received
by all the other devices on the network, the “slaves,” which synchronize to the
reference. Or thinking in video systems
terms, the devices are effectively “gen-locked”
to the reference.
Unlike in a video system, the IEEE 1588
timing reference doesn’t travel on separate
cables. Rather it is transmitted on the same
network as the digital audio packets. However
the timing and audio packets are completely separate. When the PTP message
(clock data) is received by a device on the
network, it runs an algorithm to adjust its
local clock to synchronize with the grandmaster
When the grandmaster sends its PTP
messages, it takes a finite amount of time
to reach all the slave devices on the IP network.
To provide greater timing accuracy,
IEEE 1588 provides a means to measure the
roundtrip time from grandmaster to slave
and back to grandmaster, and to compensate
for this offset.
BACK TO THE SWITCHES
In addition to this roundtrip delay,
there are also delays through the Ethernet
switches themselves as they buffer and
transmit data packets. If a switch could provide
a measurement of how long it takes a
packet to get through it, that information
could also be used to fine-tune timing. The
2008 revision of IEEE 1588 provides for
what is called a “transparent clock” that
does just that.
A switch with a transparent clock, “tries
to correct the time stamp for the time packet,”
Shay, who is also chief science officer
for Telos Alliance, said. “This occurs in the
switch itself and reduces the error at each
stage of the process.”
A problem for an AES67 system design
is that switches with transparent clocks
aren’t that common, at least not right now.
“There are some IEEE 1588-aware switches.”
Shay said. “These switches are not yet
ubiquitous, but they are becoming more
available. [The system] works better with
the special switch, but you don’t have to
use them. There’s less error with an IEEE
1588 switch, but you can achieve a certain
level of accuracy without [it].”
In addition to the types of clocks already
mentioned, IEEE 1588 describes yet another type, a “boundary clock,” which slaves to
a grandmaster and, acting like a sub-master,
retransmits timing data to slave devices
down the line.
Audio over IP is multichannel in the
sense that a source connected in some way
to the network can be made available to every
other destination device. No more need
for audio distribution amplifiers to send
shared feeds to different studios.
Shay said that a 1 gigabit Ethernet connection
can carry 250 stereo audio channels.
Of course redundant cables would be
good systems design.
“This is the point where MADI is way
obsolete,” he said. “AoIP has many times
more capacity than MADI with a smaller
What is AES67?
From the Audio Engineering Society’s Publication
of the AES67 Standard:
“High-performance media networks support
professional quality audio (16 bit, 44,1
kHz and higher) with low latencies (less than
10 milliseconds) compatible
with live sound reinforcement.
The level of network
performance required to
meet these requirements is
available on local-area networks
and is achievable on
A number of networked audio systems have
been developed to support high-performance
media networking but until now there were no
recommendations for operating these systems
in an interoperable manner. This standard provides
comprehensive interoperability recommendations
in the areas of synchronization,
media clock identification, network transport,
encoding and streaming, session description
and connection management.”
For more information, visit www.aes.org.
Another plus for AoIP is that it can,
through an IP network, tie together studios
or facilities that are remotely separated,
such as studios in two different cities, or
remote trucks to studio, or even truck to
truck or truck to announce booth. Not only
outputs from one location could feed others, but sources could also be made accessible
as well. But how would these systems
be synced together?
Shay suggested that each location have
a GPS receiver connected to the network
to obtain a global time base. He noted that
while there are GPS receivers with built-in
IEEE 1588, they are not nearly as common
as GPS receivers with NTP. So until GPS with
IEEE 1588 becomes more available, one way
around this situation is to sync an IEEE 1588
grandmaster to NTP that is synced to GPS.
Precision time synchronization as well
as low packet size for the audio payload has
allowed AES67 to achieve its target audio latency
Considering the potential of AES67 as
a whole, “we are coming to a crossroads,”
Shay said. “When the transformation is complete,
there will be no more XLR devices.”
Thanks to Greg Shay who provided
background information for this article.
Mary C. Gruszka is a systems design engineer,
project manager, consultant and
writer based in the New York metro area.
She can be reached via TV Technology.