For a good encoder, the challenge is to maintain constant picture quality with a fixed channel rate.
While MPEG has existed for more than a decade and continues to evolve, there are many options within the standards that seem new to many observers. In this and future articles, we will look at some of these interesting features.
By now, we all know that video quality is affected by bit rate, but what are the factors that affect quality when we are dealing with practical (for example, fixed-bandwidth) systems? As MPEG compression is based on a Group of Pictures (GOP) construct, we can change both the number of pictures in a GOP as well as the sequence of I, P, and B-pictures (Intra, Predictive and Bi-directionally predicted).
In general, longer GOPs mean less bits needed for the same quality because the I-frame repetition rate is lower, but that comes at a cost. Because all pictures within a GOP are ultimately referenced back to the I-frame, both transmission errors and channel changes can mean corrupted (or no) pictures until the next I-frame is decoded. For a GOP length of 15 pictures at 30fps, this means a possible delay of up to half a second for clean pictures to reappear.
But a longer GOP can also mean that coding errors due to aggressive compression will tend to persist longer too. One way to get around the visibility of these artifacts is for the encoder to automatically insert I-frames on significant scene changes. The challenge here is to control the sensitivity of the scene-change detection algorithm, lest the encoder force too many I-frames on busily changing content.
Scene fade detection is another way to improve encoding quality; if the encoder can differentiate between a fade-to-black sequence and a motion sequence, more bits can be allocated to static content change rather than spuriously generating unneeded motion vectors. Dual-pass encoding is another feature of many encoders, whereby the bit allocation for a frame (or group of frames) is established by encoding the video twice — once with a rough estimate of quantization levels, and then a second time by using picture statistics to refine the quantization process. This usually means an extra frame of delay for live encoders. An offline software encoder, on the other hand, can perform this dual-pass encoding over large spans of time, even over an entire program.
Normally, real-time transmission of MPEG-encoded pictures uses what is called an open GOP structure. This means that B-frames within one GOP can reference future I-frames from another GOP. However, in certain applications, a closed GOP can be used, where all B-frames from one GOP reference only the (previous) I- and P-frames from within the same GOP. In other words, B-pictures will have been encoded using only backward prediction (or intra coding in MPEG-4). This is usually the case when offline editing must be facilitated, as well as real-time splicing of bit streams, because all frames of the GOP can be decoded without the need of another GOP.
However, B-pictures will often be larger when using closed GOPs, as there are fewer reference pictures from which to generate a prediction. In practice, when picture size is limited by a constrained bit rate, this means that continuous coding with closed GOPs can yield a lower picture quality. Bottom line: When encoding live video, open GOPs should be used, unless splicing capability is needed, and the encoder can seamlessly switch between open and closed GOPs without artifacts.
Bit rates are not fixed
While the choice of a bit rate may seem limited from an allocation standpoint, the actual transmitted bit rate of course varies from frame to frame. With constant bit rate (CBR) encoding, individual pictures will have variable sizes, but they are then buffered into a constant bit rate channel. One can think of this buffer like a reservoir with different filling and emptying characteristics.
Figure 1 shows the behavior of such a buffer, as realized in an MPEG decoder; the diagram illustrates the instantaneous number of bits in the decoding buffer, for a hypothetical seven-picture sequence. The diagonal slope of the curve shows the buffer filling with bits, and the vertical lines show the bits being removed from the buffer (and displayed for each picture or frame). The slope of the diagonals thus represents the coding bit rate and, ignoring the discontinuities of the picture removals, can be seen to be constant. This model, called the video buffering verifier (VBV) is present in the encoder so that it can anticipate the behavior of the decoder and avoid overflow of underflow of the buffer.
The startup time to the left of frame #0 is called the VBV delay of the system and often can be adjusted in the encoder. In practice, this startup delay can be several frames in length, and the buffer can be large enough to hold a half-second's worth of data or more. This length of time is needed to allow the encoder to average the bit rate over enough pictures and to provide enough headroom so that the encoder can maintain quality while picture content fluctuates.
Coding delays can be adjusted
The long VBV delay, however, has other consequences, such as picture decoding delay, which adds to the GOP considerations we saw earlier. When tuning to a new channel, the video buffer starts filling with about half a second of delay, and picture decoding starts at an arbitrary picture, creating a total of between half a second to one second of delay before pictures are presented to the display. MPEG offers a low-delay mode to ameliorate this, producing a total encoding and decoding delay of less than 150ms.
This mode, when used, indicates that the sequence does not contain any B-pictures, that the frame reordering delay (due to out-of-display-order B-frames) is not present, and that the VBV buffer is allowed to underflow when encoding pictures. (Underflow will cause the decoder to repeat pictures.) Such a mode, while not practical for high-quality video, is useful for closed-production applications such as IFB, which enables a news studio to communicate live with a field reporter over a back-channel IP connection.
For a good encoder, the challenge is to maintain constant picture quality with a fixed channel rate, without overflowing or underflowing the buffer. Under this constraint, the picture quality with CBR typically varies inversely according to the complexity of the video. Encoders will attempt to assign different quantization scales to different parts of a picture; because complex pictures require more bits, trying to maintain a constant bit rate will result in complex pictures having more artifacts than simpler pictures.
Bit rates can be varied on the fly
An alternate way of coding, when there is sufficient instantaneous bandwidth, is to use variable bit rate coding (VBR). In Figure 2, we see the behavior of a decoder buffer when VBR coding is used. In this mode, the encoder assigns bits according to a peak target bit rate and simply stops sending bits when a defined buffer level is reached.
The simplest VBR stream is one where the picture quality is held constant (for example, by applying the same quantization scale over an entire picture), with no other constraints. In this case, the bit rate generally varies directly according to the complexity of the video. A sophisticated VBR encoder — especially an offline one — can take into account capacity over a large period of time or the available bandwidth in a channel. (In practice, encoder designers will always use a larger buffer than shown here, both for practical hardware reasons, as well as to account for operational anomalies.) DVD encoding suites, for instance, typically use a two-pass VBR encoding. In this application, the bit rate need not be constant, just the total number of bits on the disc.
In future articles, we'll look at how bit allocation works across a multiplex of services, and how MPEG supports some rather appealing opportunities for controlling multicast program delivery.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: email@example.com