A/V Sync: No Simple Solutions

Last time we discussed A/V sync and how to measure it accurately using commonly available tools that directly analyze the encoded audio and video streams thereby removing measurement errors due to differing decoder implementations. This time we will take a look at another approach to solving the problem. By the miracle of modern digital electronics, this one does it automatically.


Tektronix has introduced a system called the AVDC100 which through sophisticated audio analysis and a video watermark can automatically detect and correct A/V sync errors. A very simplified overview of the system is that it creates an envelope representation (an average of sorts) of the audio signal and encodes this envelope into a robust watermark that is inserted into the corresponding video signal. After the audio and video signals have passed through a plant, storage device, or distribution, the watermark is detected and decoded back to the envelope of the audio signal. This signal is then compared to the actual audio signal. If the A/V sync is found to be incorrect, a variable audio delay is automatically adjusted to correct the problem.

Although the watermark can be inserted at any point, obviously the earlier in the chain the better. Likewise, the detection and correction should be done as far along in the signal path as possible - just prior to final emission encoding is ideal. The watermark has been designed to withstand normal video operations including MPEG compression, and its robustness can be adjusted to allow it to be inserted into differing data rate services see Fig. 1.

(click thumbnail)Fig. 1 Tektronix AVDC100 Signal Flow (courtesy of Tektronix)
This system is very sophisticated and can work quite well, however there are some considerations worth bearing in mind. One is that as of this writing, correction can only be done in one direction - namely by delaying the audio to match the picture. Luckily, it is the best direction and fixes the very noticeable audio leads video problem. Another issue is that although very innocuous, the video watermark does take some bandwidth. In a relatively high data rate system such as ATSC, this need not be a concern, however systems that have tight data budgets may not be able to spare enough bandwidth for the system to be effective. Even with these two issues, the benefits can far outweigh the challenges.

I think that our discussion of A/V sync can be summarized into two main areas: measurement and correction. The goal should always be perfect A/V sync, meaning 0 msec. Although this is rarely achieved, it should always be the goal. The path to that goal is paved with measurement and correction. To paraphrase an old real estate saying: measure, measure, measure. Proper A/V sync should be checked and corrected at every step in the signal path. Do not fall into the deadly trap of relying on making an adjustment in the emission encoder or multiplexer to tune out plant timing problems. Whether it is manual or automatic, the tools are available and it is absolutely possible to do this right and do it now - there just is not a good excuse for letting these problems continue.

One additional area that has been brought to my attention involves certain master control switchers, and it seems to have no simple solution. In the last column I said that a fixed delay equal to the sum of all of the Digital Video Effect (DVE) delays should be inserted into the audio path and that the switcher should maintain this sum total delay on the video path regardless of how many DVE devices might be active at any given time. Unfortunately, not all switchers are capable of doing this. The result is that an audio delay is required to either advance or delay the audio silently, depending on which effects are active. Presently, there is no way to do this in an artifact-free manner. Audio will either be lost or stretched, and in either case will not sound good during the transition. There are two possible solutions.

One is to design a sophisticated audio processor that allows silent delay changes (in both directions), and the other is to keep the video delay constant. It might seem logical to choose the latter as the former is a large and complex challenge especially as there are many different master control switchers out there and just as many different control protocols.

However, recently, Ireceived a very nice and detailed note from J. Carl Cooper of Pixel Instruments. His company has developed an audio delay that seems like it may allow for rapid adjustments of audio delay without the common side effects of doing so. I am interested to see and hear this work, and I am especially interested to see how it will deal with 5.1 chanells of audio. I will investigate this further and will report back with my findings.


By now we have covered some of the basic (and not so basic) concepts necessary to understand audio for digital television and we can now begin a multi-part series of articles that will bring many of these topics together. We will attempt to describe the real-world audio path from program creation through storage and distribution to affiliates and take it right up to the input terminals of the Dolby Digital (AC-3) encoder.

There are many roadblocks that exist which are currently preventing multichannel audio and metadata from making it all the way from Hollywood (or New York) to consumers via terrestrial broadcast. These range from limitations on the number of audio channels that exist on VTRs and servers (the standard seems to still be four channels!), limitations on the number of audio channels that can be passed through satellite distribution systems, and very few ways to easily carry audio metadata.

The largest issue that seems to have been overlooked is the fact that most plants out there in the real world are, dare I break the news, analog. There is a whole lot of wiring required to pass 5.1 channels of main audio plus an SAP and a visual descriptive program through an analog plant. There are some creative ways to make it all work - even in an analog plant.

In order for it to all make sense, we have spent the last few articles working the signal flow backwards from the consumer. This allowed us to see the goal of the entire process from the eyes (and ears) of the target audience. Next time we will start at the real beginning and discuss what happens during program creation. It begins with proper monitoring, checking downmixes, and creating appropriate metadata. It also means getting the audio onto tape, either with picture or without, in a standard manner which of course varies from format to format. This will also start us down the road that will eventually end up where we started the first time. Got all that? I promise it will all work out in the end!

As always, thanks for your excellent support. I have been a bit overwhelmed due to conferences and the like and have a small backlog of e-mails that require (and will receive) responses. In the meantime, thanks for your time and keep the feedback coming!