Keeping It All 'In Sync'

After tackling the "loud ad problem" last time, we are ready to jump into what I believe is the second most popular television audio complaint: audio and video synchronization, better known as "lip sync," and as we will refer to it for the sake of brevity, A/V sync.
Publish date:
Social count:

After tackling the "loud ad problem" last time, we are ready to jump into what I believe is the second most popular television audio complaint: audio and video synchronization, better known as "lip sync," and as we will refer to it for the sake of brevity, A/V sync.

It would seem with today's advanced digital technology that this relatively old problem would be solved. Unfortunately, the same hand that giveth taketh away as they say, and this advanced digital technology is making the problem worse. Audio and video now have the capability of being delivered far more cleanly and clearly than in the past, and as such, synchronization errors are much more noticeable. Have no fear. There are now some advanced measuring technologies that can characterize the problem and help to solve it, possibly even automatically.


Let's explore some long-held beliefs about the perceptibility of A/V sync. Most film editors can easily detect sync errors of plus or minus one-half of a film frame. As the frame rate in the U.S. is 24fps and in Europe it is 25fps, this equates to approximately +/- 20 msec. Other figures abound, such as plus or minus one video frame (+/- 33/40 msec), and a curiously tipped figure of +5/-15 msec. This last one comes from a specification set by Dolby Laboratories for Dolby Digital (AC-3) decoder performance. The requirement is that audio cannot lead video by more than 5 msec and it cannot lag by more than 15 msec. Why, you might ask, is it tighter in one direction than the other?

It is a fact that light travels much faster than sound, and we are all used to seeing this proven, even if we don't notice it. Here is an example: a basketball hitting the court in a large sports venue will look and sound relatively correct to the first few rows. With the tickets I usually end up with (much, much farther back), the sound lags behind the sight of the ball hitting the floor. If it were possible to get any farther back from the court, the sound would lag even more, but no one seems to complain because it all seems correct. Imagine for a moment if the timing was reversed. As you are watching the game, the sound of the ball hitting the floor arrives before the ball hits. This would be a very unnatural sight and would likely seem wrong even if you were in the first few rows. Human perception of A/V sync is far more sensitive to the unnatural occurrence of sound before action.

The International Telecommunications Union (ITU) released a specification called BT.1359-1 in 1998. It was based on research that showed the reliable detection of A/V sync errors was between 45 msec audio leads video to 125 msec audio lags video. Remember, this is just the detectability region; the acceptability region is an even wider +90 to -185 msec. This study used "normal" people (i.e. no film editors), which helps to explain why the range is so wide. These are worst-case numbers and the goal should always be +/- 0 msec.


A/V sync errors within the TV plant are not new to digital television; they have plagued NTSC for decades. Some basic guidelines to keep in mind are that audio operations, digital or analog, are generally very low latency. Dynamic range compression, equalization, mixing, etc. can all be accomplished in only a couple of milliseconds in the digital domain, falling to microseconds in the analog domain. Rarely, if ever, are compensating video delays required as the latency is so low.

Video processing, on the other hand, takes far more time to accomplish. This is due to both the high bandwidth and the frame-based structure of video signals. Similar to audio, any time a video signal is digitized, operations on that signal will take longer. Unlike audio, however, most video effects do not have the ability to be accomplished in the analog domain, so some video processing delay is inevitable and compensating audio delays are required. It is imperative that compensation be provided for any video device that has delay in excess of a few milliseconds; otherwise A/V sync will become variable and will change as the signal path is modified by routing or patching. To help with situations like this, the ITU made an additional, very logical recommendation called ITU-R BT.1377, which suggests that video and audio equipment be labeled to indicate processing delay - if it is variable delay, indicate the range - and that any delays are noted in milliseconds to avoid discrepancies due to differing frame rates.

One very common device to watch out for is the video frame synchronizer. Due to its very nature, a frame synchronizer typically produces between one and two variable frames of delay. Getting video and audio from a remote production to the final point of emission (i.e., the local television station) typically requires the signals pass through multiple frame synchronizers. If the audio is not suitably delayed to match, the results will be predictably ugly. Suitable audio delays are available, and some will even track the variable video delay and guarantee perfect sync.

Another group of devices to watch out for are digital video effects (DVE) systems, which can add many, many frames of delay. Since the delay of a DVE is generally a fixed value, a commonly available fixed value audio delay can also be used. Keeping these systems permanently inline, if possible, will also help to reduce variable A/V sync.


With the in-plant situations described above it is easy to measure A/V sync with an oscilloscope and a known good reference signal, such as the standard "beep flash," and adjust the system for minimal offset. What about transmitting the signals to consumers?

In NTSC, if the audio and video are synchronized at the input terminals of the transmitter, the program will be reproduced properly when received. ATSC is not so simple. The audio and video signals are encoded separately and then multiplexed together into a transport stream that is sent off to the modulator and RF sections of the transmitter. The encoding process takes time and it is different for audio and video. The multiplexer must know this timing exactly so that it can generate the proper Presentation Time Stamp (PTS) values, which are used by the decoder to - you guessed it - present the audio and video in sync. If the ATSC encoder is self-contained, meaning the Dolby Digital (AC-3) and MPEG video encoders are built-in, A/V sync calibration is very simple and has likely been preset. If the Dolby Digital (AC-3) encoder is external, as is presently the case if you wish to transmit 5.1 channel audio, then the situation is slightly more complex.

(click thumbnail)Fig.1
In either case, the most accurate way to make the measurement is to analyze the transport stream. I know it is tempting to use a consumer (or professional) receiver and look at the audio and video outputs to check sync. Unfortunately, this is very unreliable and probably caused much of the initial A/V sync trouble in DTV.

Interra Digital Video Technology ( has a software application called SyncCheck that will take a transport stream, demultiplex the audio and video streams, decode them and display them as shown in Fig. 1.

The decoded audio waveform is shown for selected channel (all 5.1 are available), and an actual frame-by-frame display of the video can also be seen. As SyncCheck is decoding the transport stream using reference software decoders, very precise measurement is possible without nonstandard or incorrect DTV decoders giving false indications of A/V sync.

You might be wondering how to get the transport stream into SyncCheck, and I was wondering the same thing. It turns out that a relatively inexpensive (<$400) HDTV receiver is available on a PCI card and it turns a PC into a DTV receiver. Importantly, the card and its included software allow the received transport stream to be stored to the computer's hard drive. SyncCheck can import this transport stream, demultiplex it, decode it and display it by simply supplying the "beep flash" test signal to your encoder (in sync of course), then analyzing the transport stream and making any required adjustments. This is a very cost-effective way to ensure that your station's HDTV emission encoder and multiplexer is set correctly.

Once you have aligned the transport stream for perfect A/V sync, you can then look at the outputs of DTV receivers and know with confidence that the signal you are sending is correct.

Next time we will wrap up our discussion of A/V sync and look at a system that allows constant monitoring and adjustment of A/V sync with no special test signals required - a trick only possible in the digital age and only necessary because of it. Thanks for your continued support and the steady flow of e-mails, and special thanks to Paul Collins and Leanna Nguyen of Interra.