Establishing an audio reference and timing system

Two of the most critical jobs a broadcaster needs to tackle when transitioning from analog to digital technology are understanding and establishing a reference and timing system for digital audio. This audio reference is necessary as a foundation for clean digital sound, whether for a simple system or a complete plant.

Delivering high-quality sound on a daily basis requires attention to details for both producing audio and timing it to the rest of the equipment in the video environment. Locking the proper reference to a stable source and establishing proper distribution of this signal will help eliminate the artifacts, which may materialize as pops, clicks or mutes, plaguing an improperly designed system. In addition, an engineer should become familiar with devices that are available to maintain clean audio during switching and what to do when a source has no external reference capability.

Let's investigate the practices necessary for clean audio that require the attention of today's broadcast engineer. For our purposes here, the digital audio referred to is the pulse code modulation (PCM) type usually transported as discrete AES3 audio pairs or embedded in a serial-digital audio-video bit stream.

Start at the beginning: Time

Contemporary broadcast reference designs use the Global Positioning System (GPS) as the origin of plant timing. Familiar GPS navigation accuracy is based on acquisition of a precise timing signal established by the atomic clocks in the 24 active GPS satellites orbiting the earth. These signals remotely reference a quartz clock in all GPS receivers. With the satellite signal acquired, a land-based receiver can easily and inexpensively create a clock as accurate as the satellite's atomic one.

Not intended for navigation but for its timing accuracy, a GPS-capable signal pulse generator and master clock outputs several precise reference signals. These include color black, tri-level sync, digital audio reference signal (DARS), longitudinal time code and even a 10MHz clock as a timing source for an external signal generator.

However fascinating, the GPS element of this benchmark system is optional. The most important — and essential — practice for all installations is timing audio and video to a common, stable source.

Figure 1. The X or Y preamble of DARS aligned to the half-amplitude point of the leading edge of the sync pulse of the TV signal. For NTSC’s 29.97, this happens on the fourth line of every fifth frame. (AES11-2003) Click here to see an enlarged diagram.

As described, a broadcast signal generator creates a color black video reference signal (in this example at GPS accuracy) and may also supply a synchronous, video standard, 48kHz DARS that is in time with the video reference. (More on DARS later.)

SMPTE 272M states “audio is clock synchronous with video when the sampling rate of the audio is such that the number of audio samples occurring within an integer number of video frames is itself a constant integer.” For 29.97, it's 8008/5. This means that exactly 8008 samples of audio will fit in five synchronous frames of video at NTSC's 29.97 frame rate. (See Figure 1.) Once synchronized using DARS, the digital audio and video should stay properly timed and aligned.

Synchronous equipment relies on aligned audio and video frame boundaries for proper operation. When all of this works, switching and processing can occur with minimal audible artifacts.

Clocking requirements

Whether it is the familiar color black video signal, DARS or the less used Word Clock, one of these signals will usually fulfill most clocking requirements. However, be prepared to provide one of these three signals as a slave to the primary reference you've chosen. (An example is a digital audio mixing console using Word Clock that can't accept DARS or color black.)

Color blackColor black (black and burst) is frequently used to reference TV audio equipment because of its common availability and established video clock accuracy in the broadcast plant.
DARSA DARS AES11-2003 signal is an AES3-formatted audio signal capable of being referenced, locked and distributed at the precise audio frame rate. It may contain only the preamble portion of the signal without active audio and, if so, is often referred to as AES silence or AES black. It is usually distributed on coax to the appropriate audio device.
Word ClockAs mentioned in AES11-2003, Word Clock is a square wave at the sampling frequency basic rate. This signal is not standardized, may be looped and is commonly carried on coaxial cable. Word Clock is infrequently used to reference broadcast and audio recording equipment; however, it is required frequently enough that it's important to understand how and when to use it.

Careful distribution

Regardless of the type of timing signal required, the proper reference must be available and properly connected. A series of cable runs and DAs should be thought out and implemented. Cable lengths should be observed so long distances don't induce timing errors. Reclocking DAs should be used when cable lengths mandate them.

Discrete digital audio is frequently transported on 75Ω coaxial cable. When embedding audio per SMPTE 259M and 272M for SD, and SMPTE 292M and 299M for HD, the audio is also distributed on coax. Sixteen channels can be multiplexed in a single stream on a single cable with video and data. However, the same timing rules apply whether the audio is distributed as discrete AES pairs or is embedded. Synchronization must be maintained for clean signal routing and processing.

Using V-fade

Unfortunately, synchronizing the audio may not always be enough to ensure clean switches between sources. Even if the audio and video frames are aligned and a switch happens exactly between audio samples, the transition may occur at extreme opposite polarities of each signal. Switching at this point may cause a sharp transient that yields an audible click in the sound.

The new NBC Universal Studio 8H production control room in New York City’s Rockefeller Center is equipped with the latest HD video and digital audio technology. These systems get their timing reference from redundant master signal pulse generators locked to GPS receivers that are fed from dual antennas atop NBC Headquarters. Photo by Jim Starzynski.

One solution is to use a process known as a V-fade. V-fade is a function of an audio router or switcher that fades down the old source and fades up the new one around the switch point. It reduces the chance for extremes in polarity and cures the problem.

Sample rate conversion

Sometimes supplying a timing reference to all the digital audio gear installed in the video plant isn't possible. Not every audio source component is designed to work in a video environment. For instance, CD players work at 44.1kHz and need to properly interface to the 48kHz audio gear used in most video facilities. When the numbers don't match up, the sample rate conversion (SRC) solves the problem.

As the out of snyc source is processed at the input of an SRC-capable mixer, DVTR or frame sync, the SRC realigns this incoming digital audio and times it to the reference of the receiving device. The formerly random source is now properly timed to the rest of the system.

SRCs in distribution gear can also clean up some artifacts left over from the digital audio switching process. Think of SRCs as the audio equivalent of a video standards converter or frame sync.

Implementing synchronization

Establishing plant reference, managing its distribution and applying the techniques necessary to solve specific problems are important procedures that need to be performed by the video engineer working in a digital audio world. The standards and practices explained here, along with the many other digital audio references that are available, are a good start to a working understanding of how clean digital sound is produced, distributed and maintained in a modern broadcast facility.

Jim Starzynski is principal engineer in advanced technology for NBC-Universal.