The AES/EBU digital audio signal distribution standard

Figure 1. The AES/EBU digital audio data structure. Click here to see an enlarged diagram.

Figure 2. Conceptual block diagram of an AES/EBU channel encoder. Click here to see an enlarged diagram.

Figure 3. Formation of a bi-phase mark encoded signal. Click here to see an enlarged diagram.

Figure 4. Spectrum of an AES/EBU digital audio signal. Click here to see an enlarged diagram.

Figure 5. Conceptual block diagram of an AES/EBU decoder. Click here to see an enlarged diagram.

Table 1. Interface electrical characteristics. Click here to see an enlarged diagram.

The Audio Engineering Society (AES), together with the European Broadcasting Union (EBU), developed a digital audio transmission standard known as the AES/EBU standard as well as AES-1992, ANSI S.40-1992 or IEC-958. The transmission medium is wire, which has a wide bandwidth capability and allows for the bit-serial transmission of the digital audio data. The interface is primarily designed to carry monophonic or stereophonic signals in a studio environment at a 48kHz sampling frequency and with a resolution of 20 or 24 bits per sample. The bit-parallel data words are serialized by sending the least significant bits (LSB) first. Word clock data is added to the bit stream to identify the start of each sample in the decoding process.

The bit-serial data stream uses the non-return-to-zero (NRZ) coding. This means that a low voltage indicates binary zero (0) and a high voltage indicates binary one (1). NRZ results in the signal voltage remaining constant and not returning to zero between each data bit. As a consequence, information about signal polarity needs to be transmitted to correctly interpret the message. Because a single NRZ serial data stream contains no information about the signal polarity, another coding format is required. The format chosen is the Bi-phase Mark Code (BPM).

General structure of the AES/EBU interface protocol

The AES/EBU digital audio interface is designed to transmit two channels of digital audio, each using between 16 and 24 bits per sample on an electrical wire. The original AES/EBU standard specified the use of a twisted/shielded wire. A more recent version specifies the use of 75Ω coaxial cable.

The AES/EBU signal format has a structure shown in Figure 1. The signal is transmitted as a succession audio blocks. Each block is made up of 192 frames numbered 0 to 191. Each frame is made up of two subframes, subframe A and subframe B. Each of the subframes is divided into 32 time slots numbered 0 to 31 and combines sample data from one audio source or channel, auxiliary data, sync data and associated data.

Time slots 0 to 3: These time slots carry one of the sync words denoted as X, Y or Z.

  • Sync word Z: This bit sequence indicates the start of the first frame of an audio block.
  • Sync word X: This bit sequence indicates the start of all remaining frames.
  • Sync word Y: This bit sequence indicates the start of every B subframe.

The sync words are not BPM encoded. Their structure minimizes the DC component on the transmission line and facilitates clock recovery and subframe identification as they are unique in the data stream.

Time slots 4 to 7: These time slots can carry auxiliary information such as a low-quality auxiliary audio channel for producer talkback or studio-to-studio communication. Alternately, they can be used to augment the audio word-length to 24 bits.

Time slots 8 to 27: These time slots carry 20 bits of audio information starting with LSB and ending with MSB. If the source provides fewer than 20 bits, the unused LSBs will be set to the logical “0.”

Time slots 28 to 31: These time slots carry associated bits as follows:

  • Validity bit (V): The V bit is set to zero if the audio sample word data are correct and suitable for D/A conversion. Otherwise, the receiving equipment is instructed to mute its output during the presence of defective samples. This capability has not been implemented by all manufacturers, and some equipment may not generate or verify the sample word validity.
  • User bit (U): The U bit in each subframe is sent to a memory array. The AES18-1992 recommended practice specifies the format of the user data channel of the interface.
  • Channel status bit (C): The C bit carries, in a fixed format, information associated with each audio channel that is decodable by any interface user. Examples of information to be carried are length of audio sample words, pre-emphasis, sampling frequency and time codes.
  • Parity bit (P): A parity bit is provided to permit the detection of an odd number of errors resulting from malfunctions in the interface. The P bit is always set to indicate an even parity.

Figure 2 shows a conceptual block diagram of an AES/EBU encoder. Figure 3 shows how the BPM-encoded signal waveform is obtained from an NRZ data stream. The NRZ is characterized by “ones” having a determined high value and “zeros” having a determined low value. This means that long strings of zeros and ones have no transitions and result in difficult clock recovery in the receiver. BPM alleviates this condition by introducing transitions in the middle of each “one” bit interval. At a 48kHz sampling rate, the total data rate is 32 × 2 × 48000 = 3.072Mb/s. The BPM encoding doubles the data stream rate to 6.144Mb/s. Figure 4 shows the respective spectra. Figure 5 on page 28 shows a conceptual block diagram of an AES/EBU decoder.

Interface characteristics

Table 1 lists the characteristics of two types of interfaces: balanced and unbalanced. The original AES3-1985 standard defined the distribution of AES/EBU signals through a twisted-pair shielded audio cable. It specified a transmitter source impedance of 110Ω and a receiver input impedance of 250Ω, and it stipulated that up to four receivers could be connected in parallel across the audio cable. However, it gave no guidance on precautions needed to be taken by the user or systems integrator. This resulted in difficulties with reflections and standing waves, as the performance of the distribution link was unpredictable and depended on the wide variety of installation conditions encountered in practice. The unpredictability is compounded by the loose specification of the output signal amplitude, which puts an additional stress on the receiver. The standard was revised and reissued as AES3-1992. This second version specifies a receiver input impedance of 110Ω and warns against the use of more than one receiver across the feeding cable. The AES3id-1996 standard defines the unbalanced 75Ω impedance interface. This version recognizes the need to narrowly specify impedance tolerances in terms of “return loss” and transmitter output signal levels and, if properly implemented, results in a more predictable performance as it is based on well-known SDTV video signal distribution concepts. However, most digital audio equipment is equipped with XLR connectors and conversion to BNC connectors, including the use of 100Ω-to-75Ω balun transformers and signal amplitude normalizers, must be considered.

Michael Robin, a fellow of the SMPTE and former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal, Canada. He is co-author of Digital Television Fundamentals, published by McGraw-Hill and translated into Chinese and Japanese.

Send questions and comments