Audio interfaces

Digital audio has not been spared from the usual “alphabet soup” of interfacing and distribution formats. Now relatively mature, AES3 (and the related S/PDIF), AES10 (Multichannel Audio Digital Interface, a.k.a. MADI) and SDI provide point-to-point digital audio connection methods. Distributing audio over data networks, however, while not as mature, is emerging as a way to provide more flexibility and less physical complexity in a broadcast plant. The ubiquity of Ethernet wiring would seem to make it an ideal choice for distribution of audio, but some practical concerns must be addressed.

Ethernet LANs are characterized by ease of installation and expansion, together with many solutions for routing and distribution. Originally designed around the concept of multiple computers communicating over a shared “ring” of coaxial cable, the networking scheme allows simultaneous transmissions to be handled by a collision detection and packet distribution protocol. Coax was later replaced by Cat 3 and Cat 5 twisted-pair cable and routers or switches, greatly reducing installation costs. As the Internet emerged, Transmission Control Protocol/Internet Protocol (TCP/IP) became the predominant communications protocol for PC and data device interconnection, again using a packetization scheme for the data and Ethernet as the physical medium.

But TCP/IP only provides a best-effort delivery of data; there is no guarantee that all packets of data are delivered to their destination in a timely and synchronous fashion (or at all), and so the quality of service (QoS) is not likely to satisfy live audio needs. Surely, for real-time delivery of video and audio, the protocol is inadequate by itself. However, the ease of use of Cat 5 cable and connectors has inspired several manufacturers to develop proprietary schemes for transporting digital video and audio over the physical medium, both in point-to-point connections as well as LANs and even WANs. Such nonstandard schemes, however, while fulfilling some needs, do not provide for high interoperability of different equipment without multiple conversions.

To help this situation, AES50 was developed, providing a bidirectional, point-to-point connection for multichannel audio and system control over a single Cat 5 or Cat 6 cable. By using this standard with dedicated routers, a small network can be built that has low and consistent latency, with nonblocking routing, i.e., 100-percent QoS. However, while AES50 can use Ethernet wiring and routing components, it does not use the Ethernet frame structure and could thus cause compatibility issues with some devices. Also, it was not designed for large and widespread networks.

ATM provides compatibility and reliability

The ATM network protocol can be set up to provide for the QoS needed for a real-time service. This is because ATM is connection-oriented, meaning that communications are set up as a call that cannot be conducted unless both ends of the communication establish a connection. (Think of telephone voice calls, for example.) In general, packetization delays and resynchronization delays are small for ATM and large for IP.

ATM uses time division multiplexing (TDM) to encode data into fixed-length packets called cells, each containing 53 8-bit bytes — a 5-byte header and a 48-byte information payload. (The ATSC standards specify MPEG transport carrying packets of 188 bytes, where one byte is a synchronization byte and 187 are payload. This packet approach was specifically defined with ATM switching in mind, as each 188-byte MPEG-2 packet maps into the payload of four ATM cells, with only 4 bytes of padding required.) The data streams within an ATM connection travel over virtual paths from source to destination; within these paths, multiple channels can be defined. (See Figure 1.)

ATM was once envisioned as a key element of emerging broadband integrated services digital networks (B-ISDN), but these were supplanted by the Internet. The ATM protocol, however, still provides a useful method for digital audio interconnection. ATM can run on top of different transport mechanisms and can adapt to new transport technologies and greater speeds. It can be set up with guaranteed bandwidth and at a low latency, making it ideal for LAN and even WAN distribution of audio. Running at 155Mb/s over Cat 5 cable, ATM can support up to about 80 channels of audio, if there are no other calls to be supported. Permanent connections can be established, and ATM circuits can be provided by your favorite telco.

Within a plant, AES47 specifies the means to carry multiple channels of audio in linear PCM or AES3 format as calls across an ATM network. The standard allows for point-to-point and point-to-multipoint networking, and there is no theoretical limit to the maximum size of an AES47 network, given the required transport bandwidth. Audio is supported at from eight to 40 bits per sample, and the ancillary data specified in AES3 can be relayed, providing information about channel status (pre-emphasis, sample rate, channel numbers, etc.), private data, and analog conversion validity. AES51, published in 2006, specifies how an ATM service can be provided over Ethernet hardware (ATM-E), combining the benefits of both technologies. The performance of this interconnection is such that real-time audio is viable. On a 100m point-to-point link, the signal latency from end-to-end will typically be less than 10µs; this includes the buffers at each end, the duration of each ATM cell and the delay over the physical medium (usually a cable). The standard also adds the ability to relay timing information to a resolution of less than 0.1µs, providing a tool to aid A/V synchronization issues.

The distinction must be made, however, that efficient carriage over Ethernet hardware does not automatically mean the same performance over an Ethernet-based WAN, which may not be able to guarantee the QoS needed for a real-time audio connection. While Ethernet hardware can support ATM-based audio, the networks are a different matter. For this reason, AES51 is primarily intended to be used in one of two ways: as a point-to-point connection between two pieces of equipment, such as an ATM switch and a piece of audio equipment, or as a way to connect a PC-based audio processor (or similar device) to an ATM network.

If service over an IT-based Ethernet network is desired, one would ordinarily expect larger delays, a greater jitter in the arrival of packets, and even dropped or out-of sequence packets. However, prioritization processing can improve the QoS, and embedding packet timing information can mitigate packets arriving in the wrong order, at a slight increase in processing delay.

Of course, interconnection to a LAN will mean that multiple virtual connections can be made to equipment over the same network, a versatile feature. As before, however, the QoS will be constrained by the operation of the network. For this reason, audio and IT networks should be independent when full QoS is demanded from the audio network.

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments to:aldo.cugnini@penton.com