Implementing MPEG-4

In recent articles, we've discussed variable bit rate (VBR) encoding and multiplexing. This month, we'll look at MPEG-4, and how it can work with both of these processes to afford even more options for program delivery.

Scalable coding

Scalable coding is a way to offer different levels of performance in a compression system — all simultaneously and compatibly. One can think of adding color to the older black-and-white NTSC standard as a scalable scheme. Older receivers would not process the added signal, but newer ones could take advantage of the improvement.

MPEG-4 Part 10 (AVC/H.264) offers various levels of scalability, so that simpler receivers only decode the elements needed for that level of performance. Scalability can provide a more efficient bandwidth utilization when it is desirable to transmit multiple programs of different quality, without having to separately encode each program as a complete bit stream (the simulcast scenario).

With scalability, the program is coded into different layers; a simple decoder will only decode one service (the base layer) and form pictures at that associated quality level, and a more sophisticated decoder will decode multiple services (base and enhancement layers) and use the additional information to produce a higher-quality program.

A scalable encoder is shown in Figure 1. Each layer is coded using the standard MPEG-4 AVC tools. Plus, additional SVC tools are available to increase the coding efficiency. These enhanced tools include inter-layer prediction, where the base layer is used to form a prediction for the enhancement layer.

Scalability provides a way to support different native resolutions, or different temporal rates, or even different encoding parameters. In fact, MPEG-2 provides these scalability options, but existing ATSC implementations are limited to Main Profile, so no scalability is possible with that codec over current transmissions. MPEG-4 AVC, however, can be used with ATSC M/H and includes scalable video coding (SVC), so scalability over these emerging mobile services is feasible.

When using spatial scalability, different levels of resolution are provided. This could be used to broadcast to different devices that inherently have different resolutions. One application could be the efficient accommodation of different sizes of displays. Consider, for instance, cell phone performance, where the display is typically less than 3in in diagonal size, and automobile backseat displays or PCs, where the display can be larger than 12in. Another example could be service level differentiation, where a higher-resolution (encrypted) version is accessible for a premium; a variant of this could be a free low-resolution program preview, again with an associated full-resolution version sent as an encrypted enhancement service. Yet another example could be the introduction of backwards-compatible services, so that devices with different performance capabilities could be deployed at different points in time, while maintaining compatibility with the legacy devices.

Note also that “higher performance” usually means greater processing power; the base layer ordinarily requires less decoding complexity than a decoder processing multiple layers, and this translates directly into product cost, too. Temporal scalability is also supported by SVC, so that multiple frame rates can similarly be scaled upon encoding and in devices.

The ATSC M/H specification (currently a Proposed Standard) requires that the video compression algorithm must conform to the Baseline Profile of AVC, bounded by the upper limits specified for Baseline Level 1.3: Only I (intra) and P (predictive) slice types are allowed, and the spatial resolution is set to 416 × 240 pixels. When SVC is used, the video compression algorithm must conform to the Scalable Baseline Profile of SVC, bounded by the upper limits specified for Scalable Baseline Level 3.1: Only I, P, EI, EP and EB slices are allowed, and the spatial resolution of the enhancement layer is either 832 × 480 pixels, or to 624 × 360 pixels.

SNR scalability

An additional option with MPEG-4/SVC is signal-to-noise ratio (SNR) scalability (also called quality scalability), which provides the ability to transmit scalable quality versions of a program, such as by coding some of the picture block transform coefficients in different layers. (The transform coefficients carry the spatial detail information.) In this way, the level of coding artifacts can vary according to the decoder complexity.

SNR scalability provides an interesting option to the encoder. Recall that a VBR stat mux (a statistical multiplexer using variable bit rate encoding) will assign an instantaneous bit rate to different multiplex channels dynamically, depending on the instantaneous demand of each program. If some of the multiplexed programs are encoded using SNR scalability, then a stat mux can use what is called connection admission control and dynamically drop packets from layers in the scalable streams. A scalable receiver then uses the enhancement stream as it is available. The encoding trade-off becomes that of more channels vs. a higher probability of packet loss.

To get an idea of the benefit of such a scheme, while single-layer MPEG-2 produces generally unacceptable video quality with a packet loss ratio of 10-3, losses at this rate with SNR scalability are generally invisible, even to experienced viewers. This is because video codec developers have focused on producing good error concealment algorithms and designing efficient two-layer coding algorithms.

While ATSC M/H constrains SVC to spatial and temporal scalability, connection admission control could theoretically be used with a combination of these scalability options, with the goal of increasing data throughput at the expense of an occasional fallback to the lower layer. Of course, such an option must be employed cautiously; no one paying for an enhanced service would tolerate routine drops in picture quality.

Hierarchical modulation

Another way to scale transmission performance is with hierarchical modulation, where the transmission can have multiple levels of embedded modulation, each with different levels of protection. (See Figure 2.) In this crude example, the modulation consists of 4QAM for the enhancement layer (the four yellow symbols) and 4QAM for the enhancement layer (the four blue symbols in each quadrant).

Here, the bit rate of each layer is the same, but the symbols are closer together for the enhancement layer; hence, the receiving SNR is worse for that layer, as is the received error rate. (In practice, the enhancement layer would have a higher bit rate — with more symbols — and even lower SNR.) While this kind of hierarchical QAM modulation can extend coverage area and can provide graceful degradation, it comes at a cost — increased receiver complexity, as well as interlayer interference that increases the error rate for both layers.

While QAM is part of the transmission specification for DVB, ATSC uses VSB modulation, which has a different signal constellation. In one sense, ATSC-M/H could be viewed as the “hierarchical” modulation schedule, because the main and mobile services are combined at the transmission layer and have unequal protection. As such, a receiver could be constructed that automatically switches between the two services as reception conditions vary. But, recall that SVC cannot be used with the main (MPEG2) service. This, of course, would require a simulcast transmission, with a concurrent decrease in efficiency. The addition of the new signal, however, does not compromise the received error rate of the main signal.

Considerations

The broadcaster considerations for these scalability options can involve a large number of variables. One can think of the “long tail” business strategy applying here, with some interesting scenarios involving trade-offs between the expected numbers of smaller vs. larger devices, the relative bandwidth needed for the base and enhancement layers, and the costs and benefits of the different options. Let's hope that indecision paralysis doesn't stall the new business opportunities that are poised for deployment.

Also Cugnini is a consultant in the digital television industry.

Send questions and comments to:aldo.cugnini@penton.com