Managing A/V sync remains challenging for broadcasters.
Although content distributors have recently gotten the message on audio operations — in the form of federally mandated requirements on commercial and program loudness — the issue of well-managed A/V sync remains unresolved. Despite awareness by program distributors, this problem has persisted, with digital technology both exacerbating it and providing solutions for it.
Causes and solutions known
The causes of poor A/V sync are by now well known, as are the limits of acceptable performance. An understanding of the perceptibility of sync errors goes back to various studies and recommendations, including those of ITU, the film industry and ATSC. ITU found that errors could be detected at +45ms and -125ms for timing of sound relative to vision. The asymmetry is a property of human perception, naturally acclimated to the slower transit speed of sound. They also described an acceptability range of +90ms to -185ms. ATSC tightened the numbers for acceptability with a recommendation of +15ms to -45ms (as shown in Figure 1), and the film industry has come to adopt a tolerance for what it calls “lip flap” of ±22ms, i.e. ±½ frame at 24fps.
Managing A/V sync consists of four basic elements: specification and design of equipment and systems, synchronization measurement, correction, and operational diligence. Specification requires a mind-set to plan and manage the issue in new and existing infrastructures. Equipment is available that can measure A/V sync, either out-of-service (using flash-and-beep or similar technologies), or in-service, using watermarking or fingerprinting.
Watermarking is a technology that inserts a digital signature into the video and audio programs. Although it is a goal that such a process must not perceptibly affect the program content, programmers in general have not accepted that on face value, and the technology has thus met with some resistance as a universal solution. Fingerprinting appears to be a more acceptable solution, wherein the video and audio are sampled, typically frame-by-frame, and a unique non-invasive identification signature for each frame is produced. (See Figure.) Audio-video fingerprint pairs, generated at a point of known A/V sync, can thus be relayed in their own, out-of-band stream, and used to check A/V sync at other parts of the program chain.
Once the signal has been fingerprinted, improper A/V sync can be corrected at other points by re-measuring the audio-video pair, generating a new fingerprint and comparing with the reference. Correction involves delaying the audio or video a commensurate amount. This usually requires a static (manual) correction, (i.e., a one-time, infrequent correction), assuming changes in equipment or equipment behavior may have occurred. So that the system can operate continuously, this correlation function needs to operate over a moving window; studies suggest a 10-second buffer with a lock time of about 30 seconds.
Although in principle a dynamic (automatic) correction might be desirable and could be designed into a system or component, conservative engineering practice advocates staying away from such a solution, which could be susceptible to instability. This is especially true for systems with wandering A/V sync: it is better to find the cause of the problem and rectify it, than to slap a bandage on an ongoing unidentified issue.
Current fingerprinting technology can provide an accuracy of better than 1ms, with a high degree of content-identification confidence. A growing number of audio and video processors are now incorporating fingerprint generators and detectors, making system planning and control much more attractive. Current equipment can provide manual audio delay, line equalization and automatic frequency response control using OTA training. GPS synchronization can also provide another point of reference for geographically distributed systems. Many of these modules are already part of the signal path, and with a separate fingerprint stream relayed over an SNMP/Ethernet signal-management network, content can be monitored and corrected as a part of normal system operations. The VANC and MPEG streams can carry sync information, too, but such use may be precluded by incompatible equipment, or even by content agreements.
Importance of standardization
AES, SMPTE and IEEE have all formed study groups and committees to investigate various standards for lip-sync error-detection and evaluation systems. Of these, the SMPTE 22TV Lip Sync Ad Hoc Group (AHG) has been working to develop an open standard for in-service audio-video timing error measurement. The group is considering a number of requirements for such a system, including a specification for fingerprinting that has low complexity and operational robustness. Overall, the AHG aims to specify a system that enables automatic detection and measurement of A/V sync errors, works through complex distribution systems, is resilient to signal processing including up- and downconversion, and enables detection of errors at multiple points in the chain and over multiple distribution paths. Systems and technologies have been presented to the group by Mirandah and Zeitera, demonstrating the feasibility of a universal method for generating fingerprints and handling them in the plant.
The SMPTE work is aimed at ensuring that such a system be interoperable across multiple vendors of fingerprint generators and detectors. The group also is examining the requirements for the data paths and protocols needed to relay fingerprint data, including content delivery and transmission. At this point in time, a candidate technology has been identified, and initial drafting of the core documents is underway. A prototype of the system has been realized in both the hardware and software domains, and is undergoing end-user field testing to characterize its behavior and identify opportunities for optimization.
As a result of recent SMPTE committee restructuring, this work is now proceeding in the 24TB-01 Ad-Hoc Group - Lip Sync. The chairmanship has transitioned from Graham Jones (who has retired) to Paul Briscoe. The committee welcomes additional participation, as work is now reaching a mature stage; interested parties should contact the chair at firstname.lastname@example.org.
What else can be done?
Because the causes of poor A/V sync extend over the entire signal chain, solutions need to be implemented in a wide-reaching campaign. All parts of the contribution side need scrutiny, including trucks and other sources. It should become standard practice to conduct an out-of-service flash and beep of all signal-handling components. Content distributors should acquire reference decoders, and sample CE devices on an ongoing basis through appropriate test arrangements.
Encoder and signal-processing equipment manufacturers have a competitive interest in testing their product technology both in development as well as in and out of service. If there is a design problem in the equipment, new products can be developed, and users must consider replacement or upgrades if possible. A standard under development makes a strong case for FPGAs, offering a faster design-to-market cycle when a known specification is finally released.
On the consumer side, many TV receivers and set-top boxes still do not manage A/V sync properly. Lab experiments show that decoders can have differing amounts of A/V sync error on startup, and can drift, too. Unfortunately, equipment replacement may be the only solution there, which is a slow function of time. And with content increasingly being mastered in 5.1 channels, consumers will be expecting a better viewing and listening experience.
The interest of content distributors in solving this problem is hard to gauge; lacking a clear financial incentive, and reluctant to deploy a technology that may not interoperate with other systems, they may be adopting a wait-and-see attitude for the SMPTE standardization process. Fingerprinting technology shows great promise for A/V sync management, but the work of industry committees must have a quick and profitable path to market, as uncertainty can delay implementation.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: email@example.com