There is general agreement that the future of digital television transmission, over any platform, will use MPEG-4 part 10 as a codec. Sometimes known as the advanced video codec (AVC), or the ITU-T standard H.264 to the telecom industry, MPEG-4 was collectively first published as a standard in 2003. This article will look at the complexities of encoding MPEG-4 and the resulting requirement for carefully designed hardware.
The first important point to bear in mind is that, like MPEG-2 — still commonly used for standard definition transmissions — MPEG-4 is an asymmetric codec. It uses processor-intensive algorithms to create the compressed stream but is relatively simple to decode, allowing it to be implemented in an inexpensive chip embedded in millions of set-top boxes and television receivers.
One fundamental economic consideration: Consumer goods manufacturers would be extremely reluctant to tolerate any increase in their manufacturing costs, so when moving from MPEG-2 to MPEG-4, the decoding process has to be broadly comparable.
The team working on the AVC project was tasked with the goal of achieving comparable video quality at half the bit rate (or lower) than MPEG-2. Given that there was no option to increase the decoder significantly, it put even greater pressure on the encoding algorithms.
The solution was to build on the fundamental building blocks of MPEG-2 — discrete cosine transforms within macroblocks on individual frames and the use of reference frames for additional temporal compression — but add considerably more depth of processing in each area. This took advantage of the vast increases in processing power available since the original MPEG-2 standard was established a decade or more earlier. But it remained highly challenging, which is why many of the elements are regarded as options and few, if any, commercial encoders attempt to use the full toolkit.
There is not enough space here to list all the additional techniques AVC uses to improve coding efficiency, and many require a detailed understanding of the mathematics behind the theories in order to appreciate them. However, it is worth looking at some of them in outline view.
Perhaps the most important step change is the move to multiple reference pictures. (See Figure 1 on page 10.) Whereas MPEG-2 uses one or at most two reference pictures in inter-frame coding, AVC permits the use of up to 16 frames (32 fields in interlaced television). In some cases — rapid back-and-forth cuts between two camera angles, or scenes with a relatively large expanse of background (a golf course, for example) — this can produce dramatic savings in the bit rate.
Motion vectors also play a big part in providing a good prediction of how a scene will develop across a group of pictures. Again, AVC allows multiple motion vectors, as well as a greater range of both horizontal and vertical values, to significantly improve the accuracy of the predictions and thus improve the compression. By weighting the predictions, a well-designed AVC encoder can also perform much better on transitions that are traditionally tricky for encoders, such as crossfades and fade to black.
Entropy coding is at the heart of the MPEG algorithms. Entropy is defined as a measure of the degree of disorder in a system, and in image compression it is the technique by which the random elements of the picture are controlled. AVC adds several ways of managing entropy coding, including variable length encoding (as opposed to a structure based on 8-bit bytes), context-adaptive variable length encoding and context-adaptive binary arithmetic coding (CABAC).
CABAC is an auto-adaptive algorithm that can offer lossless compression of syntax elements in the video stream by knowing the probabilities of those syntax elements in a given context. This is widely used in AVC encoding, requiring considerable statistical analysis.
The standard also includes advances in resilience, including a network abstraction layer. By decoupling information relevant to more than one slice from the media stream, AVC can eliminate header duplication. This makes for more compact and stable data, deriving key information such as picture size, coding modes employed and the macroblock map from self-contained packets in the network layer.
Other resilience measures include flexible macroblock ordering and data partitioning, which allows the separation of more and less important syntax elements into different packets of data. This in turn enables the application of unequal error protection, using the resilience overhead where it can be of greatest benefit.
The network abstraction layer is also the foundation for an important extension to the standard, codified in November 2007, which introduces scalable video coding. (See figure 2 below.) This creates a subset bit stream from the overall transmission by dropping packets. In turn, this means operators can be offered a reduced size or reduced frame rate service (perhaps for mobile devices) alongside a high-quality service, and within the same bit budget.
Finally, the standard allows for a number of profiles, which give the operator another way in which the delivered quality of experience can be fine-tuned. While initial interest in AVC was around the main profile, network operators are turning to the high profile to get better quality on-screen. Other profiles allow AVC to be used as a high-quality contribution or distribution algorithm, incorporating 4:2:2 color sampling and 10-bit video streams.
These are just some items from a long list of the added functionality in MPEG-4 AVC, and they are a real tribute to the visionary engineers who worked on the standard. Most important, they are all necessary if the target of equivalent or better quality in half the bit rate of MPEG-2 is to be achieved.
As mentioned earlier, MPEG compression algorithms are asymmetric. Here is another example of asymmetry. To achieve what is in effect a two-fold improvement in compression efficiency demands at least a 10-fold increase in processing power.
The traditional architecture of a transmission encoder uses standard DSP chips to provide the processing power. However, this is unlikely to deliver sufficient performance in an AVC encoder, due not only to the large number of complex processes required, but also because of the amount of memory reads and writes required.
The use of multiple reference frames for temporal compression, and the use of multiple motion vectors, calls for many frames to be held in memory and clocked through as required, sometimes at different rates.
In simple terms, handling the computational requirements and memory management demands would necessitate multiple DSPs. After a certain point, the overhead of the communication between DSPs and the bottleneck of transferring large amounts of data — video frames — around becomes a limiting factor, and further performance improvements become increasingly difficult.
The solution is to develop a dedicated digital signal processing chip. Building a dedicated chip means that key processes can be handled onboard in application-specific hardware. Other processes can be undertaken by DSPs inside the chip. This gives the application designer the best of both worlds: the familiar programming environment and access to proven routines, with maximum speed from the direct access to other internal processes.
With high-speed I/O tailored to the requirements of high-bandwidth video streams, a single chip can support access to the full MPEG-4 AVC toolkit. And because it remains a software-driven device, it can be updated as improvements in algorithms appear, making the encoder or transcoder future-proof.
HDTV is being rolled out around the world, and its mass take-up by consumers will depend on them perceiving that HD in the home is a visibly better and more satisfactory experience than before. That is largely dependent upon the quality delivered by the encoder, which means it is vital for everyone to ensure that the best is made of the MPEG-4 codec.
Are Olafsen is the director of satellite DTH segment for Grass Valley.