Addressing Audio-Video Lip Sync at the Baseband Level

Part 1: Make an Inventory of Video Delays

While some audio-video lip sync issues may be out of a broadcaster's hands once it hits transmission, distribution and reception, there are still plenty of things that broadcasters can do to keep audio and video in sync with each other at the baseband level.

Most of the time it's the video signal that is subjected to various processes that the audio isn't. The result is that video gets delayed with respect to audio. If no correction to the audio signal is applied and the delay is sufficient, then audio is heard first before a person's lips are seen to move. This is not something that occurs naturally, so this tends to be very obvious, very quickly.

A first step in determining the delay differential is to take an inventory of all video delays from the studio to the input to any recording device or transmission encoder (like for an STL or ATSC transmitter). Look at CCD cameras, frame synchronizers, digital video effects units, etc. Check manufacturers' specifications sheets or contact the manufacturers directly for the amount of video delay in each device.

Many of these types of devices get cascaded together in a complete system, so be sure to figure out the total delay. With this type of equipment, delays tend to be fixed making it easier to compensate for on the audio side.

Modern digital video switchers are another story, however. Each generation of switchers seems to pack in more and more built-in processing and effects; each having its own inherent delays. That means that delays produced in a video switcher aren't fixed, but vary depending on how many mix-effects busses, DVEs, keys, mattes, aux busses, etc. are used for each effect. Fortunately everyday productions, like news, typically settle for a select group of effects, and the delay for each group can be calculated.

Stay tuned, there's more next time.