Video compression technology

The MPEG-2 and MPEG-4 standards are now at a relatively mature stage. At the same time, new implementations of MPEG-4 are still on the rise, especially using H.264/AVC. Both ATSC and DVB-T support this more efficient compression standard (with newer receiving devices, such as mobile displays), and newer codecs are emerging in a growing number of video applications. While MPEG-2 and AVC are now ubiquitous in broadcast, cable and satellite distribution, other codecs have found an equally widespread home for the distribution of video over the Internet. Because we are seeing more applications that cross the various media, it is useful to understand the makeup of these various codecs.

Most compression systems have similarities

All compression systems function by removing redundancy from the coded information, and the highest amount of compression is almost always achieved by lossy coding, i.e., the decoded information, while presenting a faithful version of the original information does not produce an identical set of data. Essentially, most video codecs today function by reducing the information content of video in three ways: spatially, temporally and logically.

Spatial video content (in the horizontal/vertical image dimensions) is compressed by means of mathematical transforms and quantization. The former remaps the video pixels into arrays that separate out detail information; the latter reduces the number of bits required for each transformed pixel. Temporal video content (in the time dimension) is compressed by means of residuals and motion estimation, and in some codecs, by quantization as well. Residuals reduce information by coding differences between frames of video, and motion estimation provides data reduction by accounting for the movement of pixel “blocks” (and groups of blocks, i.e., macroblocks) over time. Logical content (i.e., strings of codewords representing spatial and temporal content) is further compressed by using various forms of entropy coding and/or arithmetic coding, which remove information by efficiently coding the strings in terms of their statistical likelihood of occurrence.

Each MPEG standard is actually a collection of different tools and operating parameters, grouped into levels and profiles. The level typically defines the horsepower needed for decoding the bit stream, as defined in macroblocks per second (or per frame) and the overall video bit rate. Profiles are used to group the different tools used during encoding. For example, MPEG-2 Main Profile @ Main Level is sufficient to encode SD digital TV broadcasts, while MPEG-2 Main Profile @ High Level is needed to encode HD video.

A huge amount of content on the Internet, however, does not use MPEG-2 or AVC coding. YouTube, for instance, almost exclusively uses Flash for video compression. Flash does not use one unique codec, but rather defines a format for FLV files. These files, in turn, encapsulate content usually encoded with either the On2 VP6 or Sorenson Spark video compression algorithms. VP6, now owned by Google (which also owns YouTube), uses several standard compression techniques: a DCT block transform for spatial redundancy, motion compensation, a loop filter and entropy coding. (The loop filter is used to lower the appearance of block-edge artifacts.) While all of these are present in AVC compression, the loop filtering used in VP6 operates in what can be called a “predictive” manner. Instead of filtering blocks over an entire reconstructed frame, the VP6 codec only filters the edges of blocks that have been constructed by means of motion vectors that cross a block boundary. VP6 also uses different types of reference frames, motion estimation and entropy coding, compared with MPEG.

According to various sources, Sorensen Spark (more specifically the SVQ3 codec “Sorenson Video 3”) appears to be a tweaked version of H.264/AVC and has similarities to an earlier codec, H.263. While VP6 and Spark are essentially incompatible with non-Flash decoders, the most recent releases of Flash Player do support H.264/AVC video and HE-AAC audio.

VP6 and Spark (as well as AVC) are defined by various patents, with differing licensing terms for encoding, distribution and decoding. HTML5 (video) is another codec that has been defined for Internet use. It attempts to simplify (or remove) licensing fees. (The use of HTML5 has recently come to light regarding various video players, with the announcement that Apple would support it, and not Flash video, in its products.) Supporters of HTML5 want a codec that does not require per-unit or per-distributor licensing, that is compatible with the “open source” development model, that is of sufficient quality, and that does not present a patent risk for large companies.

Nonetheless, while HTML5 developers formerly recommended support for playback of video compressed in the Theora format, there is currently no specific video codec defined for it. In May, the WebM Project was launched to push for the use of VP8, a descendant of VP6, as the codec for HTML5. The project features contributions from more than 40 supporters, including Mozilla, Opera, Google, and various software and hardware vendors. Perhaps not coincidentally, in August, the licensor of H.264, MPEG LA, announced that it will not charge royalties for H.264-encoded Internet video that is free to viewers.

New versions of codecs

Current codecs are also being improved by means of new and emerging extensions, which have applications for storage and content management. A number of extensions to H.264/AVC support high-fidelity professional applications; scalability and multiview video have also been defined. MPEG collectively refers to the “High” profiles as the “fidelity range extensions” (FRExt), which include the High 10 profile (10 bits per sample), and the High 4:2:2, and High 4:4:4 profiles.

AVC has generally been viewed as providing a doubling of coding efficiency over MPEG-2, but the quest for more efficiency goes on. The ISO/IEC and ITU-T standardization committees have now embarked on the specification of a new video encoding standard that targets improved encoding efficiency for HD video sources. Again, the goal is to cut the bit rate in half relative to existing codecs, e.g., AVC. This new specification is being referred to as the High-Efficiency Video Coding (HEVC) standard, and the target applications are broadcast, digital cinema, low-delay interactive communication, mobile entertainment, storage and streaming. Depending on the proposed technology, a final standard could be developed by July 2012.

Standards for multiview video coding based on MPEG-2 and H.264/AVC currently exist, but support is generally limited to a single stereo view that requires glasses to view the 3-D content. MPEG is now planning to standardize a new format for 3-D that supplements stereo video with depth/disparity information and could be used more effectively with glasses-free displays.

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments to:aldo.cugnini@penton.com