Digital audio details III

This week’s tutorial continues the discussion of AES/EBU digital audio. The relationship between AES3 data and video is an important part of maintaining a solid sound and picture lock.

Before discussing how AES/EBU and video can be synchronized, the structure of the digital audio signal must be explained first. AES digital audio is composed of blocks; each block is made up of 192 frames that are composed of two subframes, one for each audio channel. Each subframe is identified with a preamble, a unique series of pulses, at its beginning. The preamble for Channel 1 is labeled X, while the preamble for Channel 2 is labeled Y. To identify the beginning of a block, the preamble for the first Channel 1 subframe is structured differently and labeled Z. (See Figure 1.)

The start of an AES subframe X (Channel 1) or Z (Channel 1 and start of AES block) is the synchronizing point for AES digital audio, where it coincides with the leading sync edge of Line 1/Field 1. The next time the two will align is five video frames later after 8008 AES frames. Synchronization only deals with the start of AES frames; AES blocks are not used in synchronizing. (See Figure 2.)

Because of this five-frame cycle, various AES3 sources that are independently locked to a common video reference have little chance of locking on to the same video frame when powered on. This would lead to a synchronized but out of phase condition where the AES frames do not line up between sources. The only way to guarantee a proper synchronization and phasing is to use a common digital audio reference signal (DARS), locked to video black, that is distributed to all AES sources. Remote AES feeds require the use of digital audio synchronizers. This should mean that AES3 sources, including video servers and digital VTRs, should be fed with DARS to make sure their AES3 output is properly synchronized, but few, if any, have a DARS input. Most digital equipment today relies on analog video black to genlock to, which will not guarantee a proper lock for digital audio.

The Society of Motion Picture and Television Engineers (SMPTE) has recognized this problem and in 1999 developed the SMPTE 318M-1999 standard that address the issue of accurately and repeatedly locking digital audio to analog video black. The standard added a 10-field (five-frame) ID to video black — the first field identified is always field one, and then it numbers the next nine fields in ascending order and repeats. This would allow AES3 sources to align to an SMPTE 318M-1999 video black source, and all of their outputs would be properly synchronized and phased. Unfortunately, this standard has never been widely adopted and is implemented in only a few sync generators.

Currently, SMPTE is working on a new video synchronizing standard that would address the shortcomings of analog video black and trilevel sync. This new standard is intended to work with all current and future digital video standards as well as NTSC and AES3 digital audio. TFTS RFT-270208 was published as a call for user requirements in February of this year by the “Joint EBU-SMPTE Task Force on Time Labeling and Synchronizing.” This is in the very early stages of standards development and will take some time before any equipment is available to use it. As the authors of the paper point out, current digital video systems are using a synchronizing signal that was developed more than 50 years ago.

What all this means is that there are only two ways to accurately lock AES3 digital audio today: use AES11/ DARS (more common and likely) or analog video black that conforms to SMPTE 318M-1999 and AES3 sources that recognize it. Synchronization and phasing is a much larger issue when multichannel audio is used — currently three AES3 sources are used when 5.1 sound is transported within a plant. If the sources are not properly locked, the resulting audio phasing problems will be quite noticeable between the front and rear channels.

The only way to monitor and test for proper AES3 synchronizing and phasing is to use a multichannel oscilloscope with delayed sweep — first locking the scope to vertical sync and then examining the AES3 signal for the correct preamble and observing if it is locked to field one.

The main difficulty with timing AES3 to video is the lack of any time markers within either’s format. AES3 was designed for real-time transportation of a pair of audio signals; there is no error correction or timing information built into the standard. When digital audio is processed or encoded, delay will be introduced and the video will need to be delayed to match the audio; today, this is a manual process. If there were timestamps incorporated into both the SDI video and AES3 signals, then realigning them would be simple and automatic. As the digital TV plant evolves, so do the requirements placed on its infrastructure.

Digital audio levels

In the analog world, a level representing 0dB was picked and a certain amount of decibels of headroom was expected above this point before distortion would be encountered. In the digital world, there is no such thing as headroom — once all the bits turn to 1s, clipping of the audio signal occurs. And unlike in the analog world, there is only one meaningful digital audio level scale — dBFS (decibels full scale). This scale represents the maximum level no matter what bit depth is used (16, 20 or 24 bits). When using this scale, all levels are expressed as a negative value less than 0dB. In a 16-bit signal, the lowest level is –96dBFS; a 20-bit signal’s lowest level is -120dBFS; and a 24-bit signal’s lowest level is –144dBFS. Normally, the average audio level is kept well below 0dBFS; in fact, most sources have a peak to average ratio of 12dB-16dB, and a good level to aim for in digital audio is –20dBFS. This provides up to 16dB before getting close to clipping, leaving another 4dB before actual clipping occurs. Following these guidelines, when sending an analog tone at 4dBm to a digital system, the digital meter should read –20dBFS.

This has been a quick introduction to professional digital audio covering just the basics and pointing out some of the issues to keep an eye out for. If you experience clicks, pops, front/read phasing errors or distortion in your digital audio, these tutorials should give you a better idea of where to look to remedy these problems.