With the growing interest in repurposing video to multiple media and diverse means of display, including mobile TV systems, it is useful to consider how MPEG-4 scalability affords a way to provide different picture quality levels to different devices.
Scalable coding is an efficient way to offer different levels of performance in a compression system — all simultaneously and compatibly. MPEG-4 Part 10 (AVC/H.264) offers various levels of scalability so that simpler receivers only need to decode those elements needed for a particular level of performance. Scalability provides a way to support different native resolutions, different temporal rates or even different encoding parameters. It also provides a method for more efficient bandwidth use when it is desirable to transmit multiple programs of different quality, without having to encode each program separately as a complete bit stream (the simulcast scenario).
A layered approach
With scalability, the program is coded into different layers, and all layers are transmitted in a bit stream. The simplest decoder will only decode one service (the base layer) and form pictures at that associated quality level while ignoring the higher layers. A more sophisticated decoder will decode multiple services (base and enhancement layers) and use the additional information to produce a higher-quality program. Note that different devices have different display resolutions, and the computational horsepower — and associated power consumption — will vary as well. Scalability thus allows broadcasters to support devices that have very low power consumption, when a lower display resolution is used, while enabling higher performance (e.g., with larger displays) on devices that are capable of higher power consumption. In a scalable codec, each layer is coded using the standard MPEG-4 AVC tools, and additional SVC tools can be used to increase the coding efficiency. (See Figure 1.)
MPEG-4 AVC is used with ATSC Mobile and DVB-H, and both transmission systems support scalable video coding (SVC). AVC and SVC streams can be transcoded into one another, with no loss of coding quality when certain parts of the process follow certain rules. Also, when converting from SVC to AVC, a “rewriting” process can be used that is significantly less complex than a full transcoding of the SVC bit stream. This can be useful, for example, when repurposing content to other broadcast, storage or distribution means.
Scalability can be more efficient than simulcast because the bit rate needed for the scalable signal can be less than a simulcast of both the low- and high-quality signals. The addition of the enhancement service must provide a noticeable improvement, and this means that there must be a significant quality difference between the two services.
A key issue in repurposing and scalability is the conversion of video from one resolution to another. In television production, video source images are generally captured in one of the 1080i, 720p, 576i or 480i formats, i.e., at 1920 × 1080, 1280 × 720, 720 × 576 or 720 × 480. To repurpose this video to different display devices, the resolution must be rescaled; when targeting small displays on mobile or handheld devices, various rescalings are possible. For ATSC M/H, a nominal resolution of 416 × 240 (WQVGA) is specified. For DVB-H, a typical video service of MPEG-4/AVC level 1.2 uses a resolution of 288 × 352 (CIF).
As a working example, let's look at the various resolutions afforded by the ATSC M/H standard. When starting with 1920 × 1080 interlaced video — and the target is an image of 832 × 480 (WVGA), 624 × 360 (3/4 WVGA, sometimes called nHD) or 416 × 240 resolution — the source image must be cropped to 1872 × 1080 (with a horizontal resolution loss of 2.5 percent) and deinterlaced. The three target resolutions mentioned above result in sampling conversions of 9:4, 3:1 and 9:2, respectively. There is no distortion of the image (i.e., the aspect ratio of 15.6:9 is maintained) because square pixels are used in each case. Similarly, a 1280 × 720 progressive image must be cropped to 1248 × 720 (horizontal resolution loss of 2.5 percent), and then downsampled to a target of 832 × 480, 624 × 360 or 416 × 240. These three sampling conversions amount to 3:2, 2:1 and 3:1, respectively, and again, there is no distortion of the image.
At SD resolutions, however, re-scaling is somewhat more complicated. A 704 × 480 image must be deinterlaced (if necessary) and upsampled to 832 × 480, or downsampled to 624 × 360 or 416 × 240. Because the 704 × 480 format can encode either 4:3 or 16:9 video, and the pixels are not square, the horizontal and vertical resamplings are not necessarily the same, thus there can be distortion of the image. The inherent complication is the choice of rendering 4:3 video as 16:9 with dark bars on the sides (pillarboxed) or stretched horizontally, either linearly or nonlinearly. With this source video, the three sampling conversions amount to 1:1, 4:3 and 2:1, respectively, in the vertical dimension. In the horizontal dimension, however, the resampling amounts to 13:11, 39:44 or 22:13, respectively. Because of hardware bandwidth limitations, it's not practical to do these “awkward” resamplings as exact interpolation-decimation operations, which would require upconversion to extremely high pixel rates — even with multiple integer scalings. In practice, the resamplings are done using filter-assisted approximations (splines), which will result in a certain amount of aliasing that is tolerable at the resolutions in use.
MPEG-4 AVC offers many other tools for efficient coding of video. While MPEG-2 has achieved widespread use, the higher efficiency of MPEG-4 AVC is resulting in a growing number of applications, perhaps eventually replacing the earlier standard altogether. The quest for more bits goes on.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to: email@example.com