HDTV data multiplexing, Part 2

This article is a continuation of our March HDTV data multiplexing article. It will deal with audio data multiplexing in the horizontal ancillary data space (HANC).

General considerations

The most important use of the HANC data space is for the insertion of audio signals. SMPTE Standard 299M, 24-Bit Digital Audio Format for HDTV Bit-Serial Interface defines the mapping of 24-bit AES digital audio data and associated control information into the ancillary data space of HDTV signals conforming to the SMPTE 292M standard.

For intrastudio applications the preferred implementation is an audio sampling rate of 48kHz synchronous to video. As an option the standard supports AES audio synchronously or asynchronously sampled at frequencies in the range of 32kHz to 48kHz. Audio is defined as being synchronous with the associated video if the sampling rate is such that the number of audio samples occurring within an integer number of video frames is itself a constant integer number.

The standard provides for the embedding of a minimum of two audio channels (one audio pair) to a maximum of sixteen audio channels (eight audio pairs). The audio channels are combined, where appropriate, into groups of four (two audio pairs). Each group is identified by an ancillary data ID.

The horizontal blanking interval

Figure 1 shows details of the horizontal blanking interval of the CB/CR bit-parallel datastream of a source format D (1920×1080i). In this format the total line duration is 2200 samples of which 1920 are active line samples. This leaves a total of 280 samples during the horizontal blanking interval. Excluding the TRS (EAV and SAV), line number and error detection leaves a total of 268 unused words to be used to transport ancillary data. The structure of the horizontal blanking interval is as follows:

EAV: End of active video TRS consisting of four words: 3FF,000,000,XYZ;
The line number data: Two words LN0 and LN1;
The error detection codes: Two words CCR0 and CCR1;
Ancillary data or blanking data: 268 words identified as CA0 to CA267;
SAV: Start of active video TRS consisting of four words: 3FF,000,000,XYZ.

The audio data packet

Figure 2 shows the structure of an audio data packet. Audio data derived from two channel pairs (two AES data streams or four individual audio channels) are configured as an audio data packet consisting of:

ADF: Ancillary data flag consisting of three words 000,3FF,3FF;
DID: One data identification word. Value depends on the respective audio group;
DBN: One data block number word;
DC: One data count word;
UDW: 24 user data words UDW0 to UDW23;
CS: One checksum word.

The UDW consists of three types of data:

CLK: Two audio clock and phase data words UDW0 and UDW1;
CHn: 16 words UDW2 to UDW17 identifying the four audio channels of a group;
ECCs: Error correction codes consisting of six data words UDW18 to UDW23.

Embedding the audio packets

Audio data packets are multiplexed into the HANC data space of the CB/CR bit-parallel datastream with the exclusion of the line right after the switching line to avoid possible audio clicks. In the 1125/60 format the switching point exists in lines 7 and 569, hence the audio data packets are not permitted on lines 8 and 570. Figure 3 shows the manner in which an audio packet is formatted from two AES datastreams (e.g. 1 and 2). The audio of each channel consists of 29 bits as follows:

AES audio: 24 bits;
V: One validity bit;
U: One user bit;
C: One channel status bit;
P: One parity bit;
Z flag: One bit common to the two channels of the pair derived from the preamble of the AES datastream.

As shown, the 29 bits of each channel are embedded into four 10-bit words. Table 1 shows the typical manner in which the 29 channel bits are embedded in four 10-bit words, in this example UDW2, UDW3, UDW4 and UDW5. This representation is for one audio data packet carrying four audio channels from AES/EBU datastreams 1 and 2. Carrying 16 audio channels requires an additional three audio data packets containing, respectively, data from AES/EBU datastreams 3 and 4 (four channels), 5 and 6 (four channels) and 7 and 8 (four channels). Each group of four channels is identified by the DID word.

Embedding of audio control packets

Audio control packets are multiplexed into the HANC data space of the Y bit-parallel datastream. They are transmitted once every field in the second line after the switching line, respectively on lines 9 and 571. The audio control packet contains data used in the process of decoding the audio datastream. Figure 4 shows the structure of the audio control packet. It consists of:

ADF: Ancillary data flag consisting of three words 000,3FF,3FF;
DID: One data identification word. Value depends on the respective audio group;
DBN: One data block number word;
DC: One data count word;
UDW: Eleven user data words UDW0 to UDW10;
CS: One checksum word.

The UDW consists of five types of data:

AF: Audio frame number data;
RATE: Sampling rate;
ACT: Active channels;
DEL m-n: Indication of the amount of accumulated audio vs. video delay;
RSRV: Reserved for future use.

Conclusion

HDTV audio embedding as described in the SMPTE 299M sets the standard for future studio applications which require at least six audio channels (5.1). In this example we have described the manner in which audio is embedded in a 1920×1080i HDTV format. Other formats described in the SMPTE 292M Standard have different horizontal blanking structures and the audio embedding will differ somehow from the example discussed here. As in SDTV embedded audio applications there is always the possibility of getting audio clicks with live source switching, so an enlightened decision needs to be made whether to embed or not to embed.

Michael Robin, former engineer with the Canadian Broadcasting Corporation's engineering headquarters, is an independent broadcasting consultant in Montreal, Canada. He is the co-author of Digital Television Fundamentals published by McGraw-Hill.

Recommended reading