Technology Corner: Randy Hoffner
DTV Latency
Throughput delay, commonly referred to as latency, is an inescapable
consequence of using digital audio and video technologies. The very
act of sampling and digitizing audio or video introduces a time
delay and things only get worse from that point through the
DTV production/postproduction/broadcast/receive chain.
In a totally digital system, the relative delays between audio
and video are typically not equal. The delays that are imposed on
the video are usually longer than those imposed on audio.
If video recording is not considered, the process of sampling
and digitizing audio in a live broadcast situation causes a brief
delay. If a CCD camera is used to capture video, an even greater
delay is introduced by reading the data out of the CCD sensor. From
that point on, the processing steps done in the digital domain are
really mathematical operations, each of which takes a finite amount
of time and therefore, requires the signal to be processed to be
held in a buffer while it is executed.
DELAY OF GAME
As NTSC television has become increasingly digitized over the
past two decades, we have increasingly seen the results of these
time delays or latencies. For a number of years, while an increasing
amount of digital processing was being applied to video, audio remained
in the analog domain. Thus, for example, when video was frame-synchronized
or digital video effects were done, video delays were introduced.
If compensatory audio delays were not introduced deliberately,
the analog audio traversed the system more quickly than did the
digitally processed video. The result has become quite apparent
as lip sync errors. In all such cases, the video signal falls behind
the audio signal.
Because light travels immensely faster than sound, we have never
in nature encountered a situation where the sound of an event is
heard before the event is seen. In fact, as the distance between
observer and event increases, the sound increasingly lags behind
the visual event; we never hear the thunder before we see the lightning.
We can, therefore, accept audio lagging behind video more easily
than we can accept video lagging audio although we can accept
lagging audio only to a certain degree.
When audio lags behind video beyond a certain point, it seems
unnatural because when we are watching television, the events we
are observing usually do not appear to be all that distant. It is
unfortunate for our senses that latencies imposed on digitally processed
video are usually greater than those imposed on digitally processed
audio.
COMPRESSION DELAYS
When digital audio and video signals are compressed and decompressed,
these processes introduce delays as well, and the latencies caused
by compression and decompression are often much longer than those
caused by digitization and digital processing in the uncompressed
domain.
Even if the many and various delays introduced in the production,
recording, postproduction, playback and plant distribution processes
are compensated, we still have a problem when we compress the audio
and video signals for transport over a network distribution or broadcast
channel.
A DTV encoder applies a relatively high compression ratio to both
audio and video signals and thereby adds considerable latencies.
This is true either for network distribution or for local station
broadcast, although the compression ratio used for most network
DTV distribution is about half that incurred in terrestrial broadcast.
The overall delay introduced by encoding and subsequent decoding
in a distribution and/or broadcast scenario is typically several
seconds.
In the case of local station broadcast presuming that (a)
audio and video latencies have been corrected to a tight tolerance
at the emission point; (b) the audio and video signals are not subjected
to further differential latency between the encoder input and the
decoder output; and (c) any differential delays between video and
audio that occur in the display devices beyond the decoder output
stage are compensated absolute latency does not greatly matter
in many respects. That is, it does not impair the viewing/listening
experience, although the relationship between program start and
end times and the PSIP timetables may cause problems such
as late tune-in or upcut recordings.
BEHIND SCHEDULE
At the network distribution stage, the encoder/decoder delays
do matter. For example, if in an analog NTSC distribution
system a network program is started at exactly 8:00:00 p.m.,
it will begin in the home of a viewer of an NTSC affiliate station
slightly more than a 1/4-second later. The only substantial delay
in such a system is the satellite hop between network and affiliate.
We must note in passing that the scenario just described is slipping
into the annals of history, as even NTSC gets increasingly digitized.
If a DTV network program is started at the network at 8:00:00,
it will arrive at the affiliates master control some number
of seconds later and some additional number of seconds will pass
before the viewer at home sees the program start. This can be a
problem.
There are ongoing discussions among industry standards groups
about how to address this DTV delay problem. One proposed solution
is to "preroll" a DTV program by several seconds, so that
it may be in a buffer at the affiliate station in time to start
at 8:00:00 (or earlier, to compensate for the broadcast encoder/decoder
delay) by the affiliate station.
If it is deemed desirable that the program begins at 8:00:00 in
the viewers living room, this will require some work on the
part of both the network and the local station. In DTV, we must
be concerned not just with lip sync, but also with absolute throughput
delays in the end-to-end system.
(Note: In a recent article on dropframe and nondropframe timecode,
I indicated that NTSC dropframe timecode runs faster than nondrop
timecode. This is, of course, backwards. If nondrop timecode is
used to count NTSC frames, the indicated timings will be long, not
short. Thanks to an observant reader for pointing this out.
|