Digital audio: Inside the AES3-2003 digital audio standard

The Mach 2 in Milano, Italy, is a 5.1 audio surround facility that features three audio studios and an editing suite equipped with Avid Pro Tools HD digital audio production systems. Photo courtesy Walters-Storyk Design Group.

AES3 details

Data structure

The AES3-2003 standard defines the serial transmission of two-channel uncompressed digital audio. Most facilities employ three or four AES3 dual-audio channels to achieve their multichannel audio goals. AES3 multiplexes two channels of digital audio with additional bits for status, error detection and user data. Transmission is one-way, from one transmitter to one receiver. The standard specifies bi-phase-mark channel coding, so AES3 audio is not sensitive to polarity reversals in cabling and connectors. The synchronous coding scheme also facilitates clock-signal recovery from the AES3 bit stream.

Figure 1 and Figure 2 illustrate AES3's block/frame/subframe structure. Figure 1a shows that the sub-frame consists of 32 time slots. The first four time slots contain a preamble. The remaining 28 time slots contain one audio sample up to 24 bits, plus four additional bits labeled “validity,” “user,” “channel status” and “parity.” Figure 1b shows that for sample word sizes of 20 bits or less, the standard reserves four time slots for auxiliary data.

Figure 1a: An AES3 subframe consists of 32 time slots. Figure 1b: For sample word sizes of 20 bits or less, the standard reserves four time slots for auxiliary data. Click here to see an enlarged diagram.

Payload data in a subframe can assume any bit pattern, so the preamble must be uniquely distinguishable. For this reason, preambles violate the bi-phase-mark rules used for audio data. Three different preamble patterns are defined: X, Y and Z. (See Figure 2.) X delineates the start of subframe 1 (left channel/channel 1); Y delineates the start of subframe 2 (right channel/channel 2); and Z is sent after every 192 frames, in place of X, to signify the start of a new block.

Two subframes make up one frame, and 192 consecutive frames make up one block of AES3 data. AES3 audio equipment transmits frames at the audio sampling rate; audio sampled at 48kHz is transmitted at 48,000fps.

Sampling rates

The AES3 bit stream derives its timing directly from the audio sampling clock of the source equipment. Receivers automatically synchronize to the incoming bit streams. AES3 transparently supports all of the common sampling rates (32kHz, 44.1kHz and 48kHz).

Validity and parity bits

Figure 2. One block of AES3 data comprises 192 consecutive frames. Click here to see an enlarged diagram.

The validity bit indicates whether the preceding sample word contains audio data suitable for conversion to an analog audio signal. Usually, this bit is set to logic 0. If it is set to logic 1, the sample word contains non-audio data, and the receiving equipment should not convert it to audio.

The parity bit implements an even-parity check on the subframe payload data. When a subframe arrives, the receiver examines the payload data. If the number of ones and zeros is not even, an error has occurred, and the receiver knows that it must take some corrective action.

Channel status data

Each subframe contains one channel status bit. For a block of data, there are 192 status bits per audio channel. These 192 bits are organized as 24 eight-bit bytes. The bytes (numbered 0 to 23) and the bit positions within those bytes determine numerous parameters.

Receivers use channel status information to configure themselves automatically to match a received AES3 bit stream. The receiver reads the status bits and sets its operating modes appropriately. Some examples of information carried in the channel status data are: sampling frequency, word length (16 to 24 bits), audio mode (stereo pair, mono, two separate channels, etc.), special operating modes for sampling frequencies above 48kHz, time of day, pre-emphasis indicator and synchronization (locked to reference/unlocked).

User data

The AES3 standard provides a user-defined, non-audio-data capability at a rate twice the sampling frequency (e.g., 48kHz = 96kb/s). One user data bit is available in every subframe, so there are two utility data channels of 48kb/s. AES3 does not specify uses of this data capacity, so you are free to develop your own applications.

Some potential uses come to mind: audio source identification, asset management information, intercom channels, embedded device controls, etc. In most cases, you'll need external interfaces to insert and read user data. An additional document, AES18-1996, makes recommendations for implementing an HDLC packet structure for user data to provide a bridge to IT environments.

Philippemoritz Production Studio in Zürich, Switzerland, has two control rooms equipped with Pro Tools recording systems and Genelec 5.1 monitoring. Photo courtesy Walters-Storyk Design Group.

AES3 describes an impedance-matched, terminated transmission system using twisted-pair (AES3-2003) or coaxial cable (AES3-id). AES3 audio has a baseline data rate of approximately 3Mb/s. Frequency components of the signal can extend as high as 30MHz. Due to wide tolerances allowed in the AES3-2003 waveform, cable bandwidths of 6MHz to 10MHz work reliably.

Applying AES/EBU in media facilities

Cabling plant

Typically, AES3-2003 provides reliable operation on 110V twisted-pair cable up to 100 meters long without equalization or amplification. Peak-to-peak signal voltages on the cable are between 2V and 7V. The standard specifies XLR connectors with pin 1 wired to cable shield and pins 2 and 3 carrying the signal. In practice, AES3-2003 is reasonably tolerant of cabling characteristics. Short lengths (less than 15 meters) of conventional audio cable usually work without difficulty. Patch cables and temporary installations are acceptable, but you may find conditions that won't work. Standard audio cable is not suitable for permanent AES3 installations in large plants where significant cable lengths are common. Induced jitter and propagation delays associated with audio cable can render AES3-2003 unusable in complex systems.

London post-production house Uncle features two Pro Tools audio dubbing suites. Photo courtesy ATG Broadcast.

When upgrading to a digital installation, you can reuse some audio cabling components such as patch bays and terminal blocks, but use caution. For example, consider the length and type of cable used for pigtails on patch bays. Type-66 blocks and other telco-style terminals generally work well for AES3-2003. Audio routers using hard contacts usually work because there are no electronics to limit bandwidth. But don't expect to reuse your analog, DA-based audio routers. The limited bandwidth of the amplifier modules typically will not pass the AES3 signal.

AES3-2003 over CAT-5 data cable works well, and the standard recommends it. CAT-5, CAT-5e, CAT-6, CAT-6e and CAT-7 cables have 110V characteristic impedance and support runs significantly longer than 100 meters without additional equalization or amplification. RJ-45 connectors are recommended, so the full range of structured wiring components are available for AES3-2003 use. For any future facility upgrade, it's worthwhile to consider using structured data cabling systems and abandoning all conventional audio cabling. Also, you can save a lot in time and equipment by simply reusing any existing data-networking cable.

Video installations frequently implement AES3 on 75Ω coaxial cable because it is familiar to video engineers. With good coax, amplifiers and equalization, cable lengths up to 1000 meters are possible. Generally, cable lengths up to 300 meters offer trouble-free operation without any equalization or amplification. Nominal peak-to-peak voltage on the coax is 1V, and standard BNC connectors are fine. You can frequently reuse existing analog video routing switches, amplifiers and equalizers for digital audio, but use caution. Bandwidth, delay, ringing, oscillation, crosstalk and stability characteristics of active electronics can disrupt proper operation. If the current equipment requires constant attention to maintain video performance, don't try to reuse it for audio. Also, any device that attempts to insert video-sync pulses is not suitable for AES3-id use. Converting between balanced and unbalanced feeds is easy. Use simple resistive networks and commercially available baluns for the connections. This allows you to repurpose older equipment creatively.

Slight variations of sampling frequency between devices can lead to jitter and framing errors. Failure to observe proper AES3 audio frame timing can result in problems ranging from relatively minor pops and clicks to a complete loss of audio as the interfaces attempt to resynchronize themselves. The solution for this problem behavior is to force all AES3 equipment to use a common sampling frequency. Generally, there are two methods available for synchronizing multiple digital audio devices: word clock and digital audio reference signal (DARS).

Figure 3. In a typical word clock application, equipment uses a TTL-level square wave referenced to the sampling frequency at coaxial loop-through connectors. Word clock is separate and distinct from AES3 and is not widely supported in modern video equipment. Click here to see an enlarged diagram.

Word clock is separate and distinct from AES3. Equipment supporting word clock uses a sampling-frequency-referenced TTL-level square wave at coaxial loop-through connectors. In master mode, the equipment's internal sampling clock is output at the word clock connectors. In slave mode, equipment will sync its sample clock to the signal present at the word clock input connector. Additional equipment is daisy-chained and finally terminated with a 75Ω resistor. Figure 3 shows a typical word clock application. Word clock is not widely supported in modern video equipment.

Sampling-rate standardization

DARS is recommended in AES11-2003 for synchronizing digital audio equipment. It is simply a standard AES3 bit stream derived from a master reference. DARS is fully compliant with AES3 standards, so all cabling options apply to DARS. You can distribute DARS throughout a facility to deliver a sync reference to AES3 devices, similar to house reference for video. DARS will function with just the preamble components of the AES3 signal, but can also carry audio, channel status and user data. This allows distribution of digital silence and/or reference tones over DARS. AES11-2003 recommends a separate input connector for the DARS reference signal. Newer equipment provides external AES3 reference loop-through connectors, making sync distribution relatively simple. Older AES3-compliant equipment frequently lacks a separate DARS input or loop-through. If you can get a DARS feed to the first device in a studio, such as a digital audio workstation, the remaining downstream devices will automatically synchronize through their regular AES3 inputs. Playback-only devices such as CD players pose a different problem. Few have DARS inputs, so they cannot sync to a reference. In this case, the solution is to use a sample-rate converter (SRC) that can be synced to a reference. The SRC reclocks the playback stream and interpolates the audio samples to provide a new AES3 bit stream at the synchronized sample rate. This is illustrated as the alternate DARS path in Figure 4. Many digital audio workstations and mixing consoles provide sample-rate converters on every digital input to accommodate a mixture of sample rates and devices.

Figure 4. In a simple DARS system, the DARS generator provides an AES3 bit stream locked to the house sync generator. Click here to see an enlarged diagram.

Figure 4 illustrates a simple DARS system. The DARS generator provides an AES3 bit stream locked to the house sync generator. DARS is distributed over the cabling to a studio or control room. A digital audio workstation is the master device because it has multiple AES3 inputs. The workstation locks to the reference, and its AES3 outputs are also locked. All the devices downstream from the workstation will synchronize in a daisy-chain fashion. Many other approaches to DARS distribution are possible.

Even a properly synchronized digital video and audio plant does not guarantee lip sync. Lip sync is a book-length subject in itself, and proper synchronization is just the first chapter.

When a router switches audio with video, the vertical interval should coincide with the start of subframe 1. If the switch point occurs within the audio sample data, a loud pop or click usually results. Additionally, other AES3 devices downstream of the switch may lose sync on the bit stream. Interesting things may occur as the devices attempt to stabilize on the new reference. Even if the AES3 bit stream is properly synchronized to the video, objectionable artifacts can still occur. Any discontinuity between the last audio sample from the pre-switch source and the first audio sample of the post-switch source can result in a pop or click in the audio. Modern DSP-based routing switches digitally fade the pre-switch bit stream audio level to zero, wait for a subframe 1 boundary, switch to the post-switch bit stream and fade the new audio up from zero. This “V-fade” approach guarantees a silent switch.

Some SMPTE standards implement AC-3 and Dolby E compressed audio streams over AES3 channels. Dis-embedding, decoding, recompression and re-embedding are additional complexities to consider to support discrete, multiplexed, multichannel, compressed and embedded audio tiers in systems design.

Integration of AES3 digital audio with digital video is not trivial. Armed with a basic understanding of the standard, you should begin planning at your plant's earliest design stages.

Routing and switching

Note: The author would like to thank WTTW's director of engineering operations Fred Engel and engineer Michael Tompary for their assistance in preparing this article.

Compressed/embedded audio, AES3 and future trends

John Freberg is the principal of The Freberg Engineering Company.

Just as SDI has become the primary interconnection for digital video, AES3 has become the primary interconnection for digital audio. A practical understanding of its capabilities and limitations has become a required part of the engineer's knowledge base.

Additional information on the AES standard is downloadable for no charge from the Audio Engineering Society Web site at www.aes.org:

AES3-2003; The core standard
www.aes.org/standards/b_pub/aes3-2003.pdf
AES-3id; Implementation using 75V coaxial cable
www.aes.org/standards/b_pub/aes-3id-2001.pdf
AES-2id; Guidelines for use of the AES3 interface
www.aes.org/standards/b_pub/aes-2id-1996.pdf
AES5-2003; Preferred sampling frequencies
www.aes.org/standards/b_pub/aes5-2003.pdf
AES11-2003; Synchronization of audio devices
www.aes.org/standards/b_pub/aes11-2003.pdf
AES18-1996; Recommendations for the user data channel
www.aes.org/standards/b_pub/aes18-1996.pdf

Need more information?

You can find additional valuable information on system design in the following references:

The Video Engineer's Guide to Digital Audio, John Watkinson, NVISION, 1995. ISBN 0-9640361-3-4
www.nvision1.com/Sales/pdfs/theguide.pdf
The Book, An Engineer's Guide to the Digital Transition, NVISION, 1996. ISBN 0-96-40361.
www.nvision1.com/Serv/RefLib/thebook.pdf
The Book II, More Engineering Guidance for the Digital Transition, NVISION, 1999. ISBN 0-9640361-7-7.
www.nvision1.com/Serv/RefLib/thebook2.pdf
Digital Audio Plant Integration: A Tutorial for Designing Digital Audio/Video Broadcast and Production Facilities, NVISION white paper, 2000
www.nvision1.com/Serv/Support/bulletins/digaudio.pdf