The ubiquity of compressed video brings some interesting challenges to broadcasters transmitting video programs. One of these is an increasing tendency to handle bit streams in compressed form, minimizing the need for quality-compromising decompressing and recompression stages. This has brought about the notion of bit stream splicing, where two compressed bit streams are concatenated — or spliced — either at playout or during some production phase. As we'll see here, the requirements for doing this seamlessly are not obvious.
In order to provide for seamless splicing, certain constraints must be met in an MPEG bit stream. In addition to the proper synchronization of video frames and frame types (covered in previous columns), the video decoder buffer (video buffer verifier, or VBV) must be managed carefully at the splice points so it doesn't create an overflow or underflow condition. While compressed video will typically have a constant bit rate, this is actually a long-term average metric. The number of bits per frame (or even GOP) is anything but constant. This is easily seen in Figure 1, which shows the fullness of the decoding video buffer as a function of time. The parameters of this closed-GOP sequence produce a transmission sequence of frames: IPBBPBB, where the relative sizes of the frames in this example are 3, 2, 1, 1, 2, 1 and 1, respectively. The parameter “vbv_delay” specifies the duration of time that the first byte of coded video data remains in the video buffer, and can be specified in the bit stream. In this example, vbv_delay is 2.5 frames (the period to the left of “0”), and the video rate is 1.5 arbitrary units per frame.
The diagonal slope of the sections describes the rate at which the buffer fills, essentially the channel bit rate. The vertical sections describe the action of the video decoder removing frames from the buffer for subsequent decoding. Because intra-frames need more bits than predictive or bidirectional coded frames, the I-frames will take up more space in the buffer and show a larger vertical section, as at Frame 0 in Figure 1. When a finite bit stream is stored in a file, it will have an end to the stream, and this is modeled by the section after “Frame” 4.8 in our example.
The buffer has a finite size (dictated by practical hardware and delay time considerations). So the buffer must be managed in order to prevent overflow, where there is not enough memory to store the data, and underflow, where the data runs out as it's needed by the decoder and display. It is the duty of the encoder to make sure this never happens by predicting the action of the decoder buffer. Hence, the VBV is a prediction that is carried out by the encoder, which can modify the number of bits from frame to frame to ensure buffer stability.
The wrong way
The result of blindly appending a new sequence is apparent in Figure 2. In fact, if the original sequence is appended to itself, there is an undesired effect — the buffer underflows at Frame 8. In the region around the splice point, the buffer continues to fill at the video rate (as the size of each frame is unchanged from the previous example). However, the first frame of the new sequence must be removed at time t = 7. This requirement causes the buffer to fill for a time less than that originally specified for the first frame of the new sequence. In this example, the buffer fills for roughly 2.2 frames (in red), whereas the original sequence called for a vbv_delay of 2.5 frames.
The consequence is that, upon the removal of the frame at time t = 8, the buffer underflows, i.e., not enough data has entered the buffer to ensure it is ready to be removed at the next picture access time. Since the new stream was encoded with the expectation of a specific VBV trajectory, this condition must not be violated in the decoder. In addition, the vbv_delay, the bit rate and the size of the new frames can only be modified if the resultant stream maintains VBV compliance. Thus, the simplest solution is to present the new sequence to the buffer at a time in advance of the decode time, equal to the amount specified in the vbv_delay for the first frame of the new sequence. In other words, the last bit of the outgoing stream should remain in the buffer for a time equal to the vbv_delay parameter of the first frame of the new stream, minus the display time of the last frame of the outgoing stream.
The right way
In order to present the new bit stream to the buffer at this earlier point in time, some bits must be removed from the old bit stream. This is shown in Figure 3 on page 28. The size of the frame at time t = 4 was reduced from 2 to 1.5. (Note that, for ease of illustration, we used a P-frame. In reality, a good splicer would give priority to removing bits from the B-frames first, so that any induced artifacts would not propagate to other frames.) This causes the old stream to end sooner and allows the new stream to enter the buffer at the appropriate time, t = 4.5. The specific solution depends on the conditions at the splice point. If the outgoing stream terminates before the new stream should start, then null padding can be used to extend the life of the old stream in the buffer. If the outgoing stream would otherwise terminate after the new stream should start, then the last few frames of the old stream can be recoded with fewer bits.
In general, when splicing into or out of streams that have been independently encoded, matching of the expected vbv_delay parameter must be accomplished in order to avoid buffer overflow or underflow. If this condition is not met, an error will be created in the buffer level, which is the difference between the actual buffer level and the buffer level anticipated by the encoder. Contrary to what some may believe, such an error will persist indefinitely in the video decoder unless a stream device repairs the error. This is due to the fact that the timing of the entry of bits is affected by the splice error, and that timing error will persist even after an indefinite period of time.
In addition, each subsequent splice can introduce its own error, so the long-term result will depend on the accumulation of these errors. Assuming that we correctly modify the bit streams to ensure proper buffer management, this means that the splicer must have the ability to recode the streams on the fly, by adding or subtracting bits when needed. A good processor will do this gradually in anticipation of the splice so a visible change in picture quality is not perceived.
When setting up encoders, there may be a temptation to limit the maximum and minimum buffer levels to a small range in the middle of the VBV so that any splicing errors will have a lower probability of hitting the top or bottom of the buffer. This is not a good idea, however, because it limits the actual bit allocation per frame, and thus has a large negative impact on picture quality.
Bit stream splicers will find increasing utility in situations where multiple precompressed bit streams are delivered to a transmission point. Knowledge of these splicing issues can yield a better understanding of how your video may be affected by these processes.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: email@example.com