Audio multiplexing

Combining multiple audio channels into a single stream is called multiplexing. It’s complex and requires careful planning. However, there are many solutions available, which the author reviews
Publish date:
Social count:

Delivery of television to private homes requires the transmission of video and audio signals. This requires some form of multiplexing of the two messages for transmission down a common communications path, e.g. medium, circuit or channel. Early developments resulted in the concept of frequency-division multiplexing (FDM). The advent of digital technology brought with it the concept of time-division multiplexing (TDM).

Figure 1. Details of an NTSC transmission channel. Click here to see an enlarged diagram.

Typical FDM applications

In 1941, when television came to age in the USA, the VHF and, later, the UHF spectra were organized into 6MHz channels carrying separate video and audio carriers, with a spacing of 4.5MHz. The video carrier uses negative amplitude modulation, with a vestigial lower sideband of 0.75MHz and a full upper sideband of 4.2MHz. The audio carrier is frequency modulated and has a p-p deviation of ±25kHz.

The receiver amplifies the two carriers in a single intermediate frequency (IF) amplifier. The video detector recovers the original 4.2MHz video spectrum and creates a 4.5MHz beat (intercarrier), amplitude modulated by the video information and frequency modulated by the audio information. A limiter removes the amplitude modulation of the 4.5MHz intercarrier, and an FM detector recovers the original audio information.

When color was added in 1953, the color information was transmitted by amplitude and phase modulation of a suppressed chrominance subcarrier with a center frequency of about 3.58MHz, an odd multiple of half the horizontal scanning frequency. The chosen subcarrier frequency resulted in frequency domain interleaving of the two spectra, resulting in a minimal crosstalk between them.

Figure 1 shows the FDM spectrum of an NTSC channel. In 1984, the FCC adopted the BTSC Stereo TV format, which is similar to FM stereo but has the ability to carry two additional audio channels, namely the secondary audio program (SAP) and a low-fidelity audio channel.

Figure 2. BTSC signal baseband spectrum. Click here to see an enlarged diagram.

Left plus right mono information is transmitted in the same way as in stereo FM. A 15,734.25Hz pilot is used, which allows it to be phase-locked to the horizontal scanning frequency (fH). A double sideband suppressed carrier, at twice the frequency of the pilot, transmits the left minus right stereo information. The SAP channel is located at five times the pilot frequency and is frequency modulated. It may be used to carry a second language or an independent program source.

Finally, a professional audio channel may be added at 6.5 times the pilot frequency. This is a low-fidelity channel used for data or voice messages. So, FDM is popular with television people. Figure 2 shows the BTSC baseband spectrum.

Figure 3. Ancillary data packet structure for 4:2:2 525/59.94 format. Click here to see an enlarged diagram.

An important TDM application is the insertion of digital audio signals into a digital video data stream. This is possible because the digital video data streams have a considerable amount of overhead. This is due to the fact that the horizontal and vertical blanking intervals are empty, except for four words of data identifying the end of the active line (EAV) and another set of four words identifying the start of the next active line (SAV). The rest of the horizontal blanking interval is available for the insertion of ancillary data such as digital audio.

The 4:2:2 component digital standard can easily accommodate eight AES/EBU signals (eight stereo pairs or 16 individual audio channels), still leaving a considerable amount of overhead for other uses. The ANSI/SMPTE 272M document defines the manner in which AES/EBU digital audio data, AES/EBU auxiliary data and associated control information is embedded into the ancillary data space of the bit-serial digital video conforming to the ANSI/SMPTE 259M standard.

Typical TDM applications

The 4:2:2 525/59.94 component digital signal can accommodate 268 ancillary data words in the unused data space between the EAV timing reference and SAV timing reference.

Figure 3 shows the ancillary data packet structure for the 4:2:2 component digital interface. Each packet can carry a maximum of 262 10-bit parallel words. A six-word header precedes the ancillary data and contains a three-word ancillary data flag (ADF), marking the beginning of the ancillary data packet (values are 000, 3FF and 3FF, respectively), an optional data identification (DID) word identifying the user data, an optional data block number (DBN) word and a data count (DC) word. A variable number of data words, not exceeding 255, follows. The packet is closed by a checksum (CS) word, allowing the receiver to determine the validity of the packet.

Figure 4. Audio data packet formatting from two AES/EBU data streams. Click here to see an enlarged diagram.

Multiple, contiguous, ancillary data packets may be inserted in any ancillary data space. They must follow immediately after the EAV to indicate the presence of the auxiliary data and the start of a packet. If there is no ADF in the first three words of an ancillary data space, it is assumed that no ancillary data packets are present.

Figure 4 shows an example in which two data streams (AES/EBU data stream 1 and AES/EBU data stream 2) are formatted for embedding into a 4:2:2 525/60 component digital signal:

  • A six-word header starts the audio data packet.
  • To begin the embedding sequence, Frame 0 of AES/EBU data stream 1 provides data from its subframe 1 and subframe 2. Each of these subframes is stripped of the four sync bits, the four auxiliary bits and the P bit. The remaining 20 bits of audio and the V, U and C bits — a total of 23 bits of subframe 1 — are mapped into three consecutive 10-bit words identified as X, X+1 and X+2 of AES1/CH1.
  • The 23 bits of subframe 2 are similarly mapped into three consecutive 10-bit words identified as X, X+1 and X+2 of AES1/CH2.
  • AES1/CH1 and AES1/CH2 form a sample pair.
  • To continue the embedding sequence, frame 0 of AES/EBU data stream 2 provides data from its subframe 1 and subframe 2. These data are similarly reduced to 23 bits and result in sample pair AES2/CH1 and sample pair AES2/CH2.
  • The two consecutive sample pairs form an audio group.
  • The 19-word audio data packet closes with a CS word.
  • Subsequent horizontal blanking intervals will accommodate frame 1 of AES/EBU data stream 1 and data stream 2, frame 2 of AES/EBU data stream 1 and data stream 2, and so on until the 192 frames (each constituting one AES/EBU block) of each of the two AES/EBU data streams are embedded.
  • From then on, a new block of 192 frames coming from the two AES/EBU data streams will be embedded, and the process will continue.
  • At the receiving end, the packets are extracted and fill a 64-sample buffer from which the original data are extracted at a constant bit rate and reformatted.

Table 1. Formatted audio data structure. Click here to see an enlarged diagram.

Table 1 shows the audio data structure represented by the three 10-bit data words. Two bits indicate the channel number, and a parity is calculated on the 26 bits, excluding all b9 address bits.

The distribution of digital audio and video signals inside a TV studio complex using a single coaxial cable is advantageous if the multiplexed signal does not have to be processed separately — in other words, if the product is ready for distribution or transmission. However, if the video signal has to feed a production switcher for further processing, the audio has to be demultiplexed and processed separately. This may prove to be awkward and costly.

If the signal has to exit the production area for distribution or transmission to other locations, the data stream has to be demultiplexed and encoded into an MPEG data stream in order to fit the capabilities of the chosen common carrier. To embed or not to embed is a decision that requires a clear understanding of the predictable and unpredictable operational and distribution requirements.

Michael Robin, fellow of the SMPTE and former engineer with the Canadian Broadcasting Corp.'s engineering headquarters, is an independent broadcast consultant located in Montreal. He is co-author of “Digital Television Fundamentals,” published by McGraw-Hill and translated into Chinese and Japanese.

Send questions and comments