Splicing MPEG

Compressed digital video's lossy nature changes the process.
Author:
Updated:
Original:

Although the motive for compressing all files is the same (i.e. to make them smaller and easier to transmit and store), compressing digital video is quite different than other compressed files. Video compression can be lossy, which means that it does not allow the retrieval of the original signal. And we are OK with that.

Frankly, it is a good thing we are not too picky, because DTV and all forms of modern media would simply not be possible if baseband were the only solution. With active picture data in HDTV occupying 1.24Gb/s, we simply could never entertain implementation and distribution.

The nature of MPEG

MPEG is highly statistical in nature. It throws away content that you are “unlikely” to miss. It does this nondeterministically, so that only the statistical data rate is constant.

In fact, taken over a long period of time, the number of frames transmitted in one second will be 29.97 frames. But taken in an arbitrarily short period of time, the number of frames in the transmission pipeline is only close to 29.97fps.

I point this out because this simple fact of the physics of television has delivered complications we never would have thought about 30 years ago when bit rate reduction was being studied in labs and universities worldwide. Take, for example, the simplicity of cutting between two NTSC signals. You simply have to find the 10th line after vertical interval and execute a switch between the two signals. Any vertical interval suffices, so long as the cadence of fields and color framing is preserved, and of course the signals are synchronous and time-aligned.

Analog and digital work the same way in uncompressed video. Unfortunately, this is not the same with MPEG.

Splicing MPEG

Because of the distinctly different nature of MPEG compressed frames, you can't simply identify vertical interval and switch to an incoming stream. I-, P- and B-frames have widely different characteristics, including the number of bytes, as a result of transmission time. (See Figure 1.)

You can't cut one stream on a B-frame and enter the next on a P-frame, because the references — forward and backward in the case of the B-frame — are no longer sensible. The MPEG cadence is therefore broken, and a decoder receives an invalid bit stream, which it cannot decode.

However, it may well work to cut from a P-frame to an incoming I frame because I-frames are internally consistent and don't reference externally to permit decoding. The decoder could simply ignore the forward reference in the P-frames when it receives the incoming I-frame. (See Figures 2 and 3.)

Thus, the simple edit in NTSC must become a splice in MPEG. The splice must be constrained to keep the MPEG stream syntax intact. The simple cut in NTSC can be defined in a few words. But the parameters developed by SMPTE to allow splicing fills 12 pages of standards language in SMPTE 312M 1999.

The good news is that it is possible and, in fact, done every minute of every day. Commercials are spliced into MPEG programs at every cable headend. Special purpose splicing systems deliver HDTV to FOX affiliates all the time.

The splice point indicators are a routine part of ANSI/SCTE 35 2004 messages carried in many compressed programs. SCTE 35 facilitates program switching and commercial insertion by carrying commands related to the splice in a separate packet ID. This allows critical metadata about program intentions to pass through splicers — even when video and audio are manipulated.

Complicated processing

MPEG is statistical, so how would it be processed in more complex ways? Many people assume that it is possible to complete most, or all, of the functions related to manipulation of programs during transmission without decoding the compressed MPEG back to baseband.

MPEG is highly statistical in nature over long periods of time. Within each frame, it is highly organized. Slices, macroblocks and individual samples are constrained by the complex language of MPEG to a manageable range of values. It is therefore a predictable structure at the frame level.

Take, for example, inserting a logo into an MPEG stream. Making such an insert possible requires creating new macroblocks that will precisely replace existing ones. These macroblocks must also show the keyed-in logo when they're decoded. One option is to partially decode the background, add the key in and re-encode the area affected.

With an understanding of the mathematical representations of foreground and background, it is mathematically possible to analyze the content and create new macroblocks without fully decoding the content. The entire signal must be subjected to some additional latency to allow the calculations to proceed and the full set of macroblocks to mesh back together again. Other transformations, including squeezeback, dissolve between MPEG programs.

Now that processing power is cheap and ubiquitous, other effects are also possible. It is worth noting that compressed audio shares many characteristics with picture content.

Transrating

Another important, frequently performed post process is transrating, or changing the bit rate of a stream after it is coded. This allows high bit rate distribution rate signals to be reduced for final emission, either as part of a DTV multiplex or as cable or satellite signals.

MPEG-4 Part 10, also referred to as H.264 or AVC, makes all of this more complex because of the many coding options available. Variable size macroblocks, intracoded portions of a frame and loads of other tools that maximize picture quality add complexity but do not fundamentally change the playing field.

Fully compressed transmission systems that do not use baseband video are now on the market. Playback of local content and splicing with live streams is routine, and keys (bugs, ratings info, etc.) and other local manipulations are no longer a problem. Even statistical multiplexing of separately coded signals is possible, by transrating all signals under the watchful eye of management software. I am sure that by the end of this decade, conventional baseband manipulation for emission will be a thing of the past.

John Luff is a broadcast technology consultant.

Send questions and comments to:john.luff@penton.com