DTV Latency

Throughput delay, commonly referred to as latency, is an inescapable consequence of using digital audio and video technologies. The very act of sampling and digitizing audio or video introduces a time delay – and things only get worse from that point through the DTV production/postproduction/broadcast/receive chain.

In a totally digital system, the relative delays between audio and video are typically not equal. The delays that are imposed on the video are usually longer than those imposed on audio.

If video recording is not considered, the process of sampling and digitizing audio in a live broadcast situation causes a brief delay. If a CCD camera is used to capture video, an even greater delay is introduced by reading the data out of the CCD sensor. From that point on, the processing steps done in the digital domain are really mathematical operations, each of which takes a finite amount of time and therefore, requires the signal to be processed to be held in a buffer while it is executed.

DELAY OF GAME

As NTSC television has become increasingly digitized over the past two decades, we have increasingly seen the results of these time delays or latencies. For a number of years, while an increasing amount of digital processing was being applied to video, audio remained in the analog domain. Thus, for example, when video was frame-synchronized or digital video effects were done, video delays were introduced.

If compensatory audio delays were not introduced deliberately, the analog audio traversed the system more quickly than did the digitally processed video. The result has become quite apparent as lip sync errors. In all such cases, the video signal falls behind the audio signal.

Because light travels immensely faster than sound, we have never in nature encountered a situation where the sound of an event is heard before the event is seen. In fact, as the distance between observer and event increases, the sound increasingly lags behind the visual event; we never hear the thunder before we see the lightning.

We can, therefore, accept audio lagging behind video more easily than we can accept video lagging audio – although we can accept lagging audio only to a certain degree.

When audio lags behind video beyond a certain point, it seems unnatural because when we are watching television, the events we are observing usually do not appear to be all that distant. It is unfortunate for our senses that latencies imposed on digitally processed video are usually greater than those imposed on digitally processed audio.

COMPRESSION DELAYS

When digital audio and video signals are compressed and decompressed, these processes introduce delays as well, and the latencies caused by compression and decompression are often much longer than those caused by digitization and digital processing in the uncompressed domain.

Even if the many and various delays introduced in the production, recording, postproduction, playback and plant distribution processes are compensated, we still have a problem when we compress the audio and video signals for transport over a network distribution or broadcast channel.

A DTV encoder applies a relatively high compression ratio to both audio and video signals and thereby adds considerable latencies. This is true either for network distribution or for local station broadcast, although the compression ratio used for most network DTV distribution is about half that incurred in terrestrial broadcast.

The overall delay introduced by encoding and subsequent decoding in a distribution and/or broadcast scenario is typically several seconds.

In the case of local station broadcast – presuming that (a) audio and video latencies have been corrected to a tight tolerance at the emission point; (b) the audio and video signals are not subjected to further differential latency between the encoder input and the decoder output; and (c) any differential delays between video and audio that occur in the display devices beyond the decoder output stage are compensated – absolute latency does not greatly matter in many respects. That is, it does not impair the viewing/listening experience, although the relationship between program start and end times and the PSIP timetables may cause problems – such as late tune-in or upcut recordings.

BEHIND SCHEDULE

At the network distribution stage, the encoder/decoder delays do matter. For example, if – in an analog NTSC distribution system – a network program is started at exactly 8:00:00 p.m., it will begin in the home of a viewer of an NTSC affiliate station slightly more than a 1/4-second later. The only substantial delay in such a system is the satellite hop between network and affiliate.

We must note in passing that the scenario just described is slipping into the annals of history, as even NTSC gets increasingly digitized.

If a DTV network program is started at the network at 8:00:00, it will arrive at the affiliate’s master control some number of seconds later and some additional number of seconds will pass before the viewer at home sees the program start. This can be a problem.

There are ongoing discussions among industry standards’ groups about how to address this DTV delay problem. One proposed solution is to "preroll" a DTV program by several seconds, so that it may be in a buffer at the affiliate station in time to start at 8:00:00 (or earlier, to compensate for the broadcast encoder/decoder delay) by the affiliate station.

If it is deemed desirable that the program begins at 8:00:00 in the viewer’s living room, this will require some work on the part of both the network and the local station. In DTV, we must be concerned not just with lip sync, but also with absolute throughput delays in the end-to-end system.

(Note: In a recent article on dropframe and nondropframe timecode, I indicated that NTSC dropframe timecode runs faster than nondrop timecode. This is, of course, backwards. If nondrop timecode is used to count NTSC frames, the indicated timings will be long, not short. Thanks to an observant reader for pointing this out.