Updating Lip Sync Issues

If some friends invite you over to check out their favorite TV program on their new wide-screen flat-panel display, you may not want to mention how bad the lip sync is off.

Even if they are tech savvy, your soon-to-be, not-so-friendly friends may get upset that they will no longer be able to enjoy their program now that they see the lip-sync problem.

It would seem that what's glaringly obvious to some of us can be totally unnoticeable to others. And that's one of the frustrations in defining and correcting lip-sync errors.

Lip-sync errors can occur just about anywhere as audio and video signals wend their way through acquisition, production, post production, plant routing, master control switching, distribution, and home reception.

CCD cameras, microphone placement and technique, frame synchronizers, standards converters, digital audio sample rate converters, audio multiplexers, audio embedders and de-embedders, film-to-tape transfers, digital video effects processors, post-production editing processes, server and hard-drive storage systems, switching and routing systems, bit-rate reduction encoders and decoders, handling of presentation time stamps, LCD and plasma displays all can contribute to audio and video synchronization errors.

Graham Jones, director of communications, engineering, NAB How bad does a lip-sync error need to be to be noticed? ITU found in 1998 that for the error to be detectable, the audio would need to lead the video by 45 msec or lag by 125 msec. These findings were reported in ITU-R BT.1359-1. The same document also gives an end-to-end tolerance for an entire system of 90 msec for audio ahead of video to 185 msec of audio lagging. The tests that produced these recommendations were done with standard definition video.

The ATSC Implementation Subcommittee in 2003 found the ITU recommendations to have too wide a tolerance. The ATSC document IS-191 recommends that audio shouldn't lead video by more than 15 msec, nor lag by more than 45 msec at the input of a DTV encoder.

Regardless of which recommendations one would follow, if A/V sync in each section of a signal path, whether contribution, production, or transmission, was at one end or the other on the tolerance scale, end-to-end cumulative errors could be quite large and noticeable to a viewer. For this reason, ATSC recommends that A/V sync should be kept to a tight tolerance.

MEASURING LIP-SYNC ERRORS

Graham Jones, director of communications engineering, NAB, and chair of SMPTE's lip-sync study group, said that based on the group's earlier investigations, systems should strive for zero delay for each stage.

That's all well and good, but how does one actually measure lip-sync errors? There are a number of products available, each with their own methodology. They're designed to work either out-of-service, by sending and detecting a special test signal or in-service by using certain characteristics of program material.

However, there is no standard for measuring lip-sync errors, which is why the SMPTE lip sync study group has taken on the task of producing a standard, or more likely, a recommended practice, to define technology for baseband audio and video synchronization measurement techniques, according to Jones.

"The problem we saw was that various methods are currently used to determine errors, but with inconsistent results," Jones said. "Until you know what the errors are and how to monitor them, it's not possible to be sure you're correcting them."

The SMPTE lip-sync study group formed in early 2005 with the intention of reviewing all aspects of the lip-sync problem and to make recommendations for detecting and correcting problems.

According to Jones, the group got a good start by gathering data from sufficient sources to get a pattern of the problem, but then put that work on hold to tackle the higher priority matter of measurements and error assessments.

As part of its new investigations, the SMPTE lip-sync study group plans to evaluate methods that are currently in use for both out-of-service and the more difficult in-service situations, and to determine the requirements for consistent measurements for both cases.

This is not as simple as it may seem at first.

MANY FACTORS

The environment in which assessments are made has to be defined to ensure that the environment and monitoring conditions themselves, like the type of monitor, seating distance to the monitor, type of program material, loudspeakers and their placement, room acoustics, as a few examples, don't introduce their own errors.

Andrew Mason of the BBC, in an AES paper he presented last fall, noted that lip-sync errors are dependent upon the distance to a display, the nature of a scene (close-up of a talking head compared to a medium or long shot), image resolution (HD or SD), whether the image is 3D or 2D, among other factors.

The SMPTE effort joins that of over a half-dozen standards organizations, including IEC, JEITA, EBU, AES, ATSC, CEA, and SCTE, which are studying various aspects of this issue for both baseband and encoded signals, some more actively than others at the moment. Work includes subjective measurements of error perception, defining tolerances and objective error measurement methodologies.

While the SMPTE work will focus on the baseband signal, Jones noted that there are large variables in how receivers decode a compressed bit stream, and may not always show the correct lip sync even if the signal itself is synchronized correctly.

The CEA recently completed work on a recommended practice, CEA-CEB20, for how a decoder should properly handle the audio and video, and what tools are available to ensure that both are presented correctly.

As the standards work progresses, system designers and operators can try the tools currently available to tackle at least large chunks of the problem. Paying attention to detail at each stage of signal flow and striving for zero delay at each stage will play a huge role in helping to alleviate the problem.

Mary C. Gruszka is a systems design engineer, project manager, consultant and writer based in the New York metro area. She can be reached via TV Technology.