Figure 1. HEVC divides video frames into a hierarchical quadtree coding structure that uses coding units, prediction units and transform units. CUs, TUs and PUs are grouped in a treelike structure, with the individual branches having different depths for different portions of a picture, all of which form a generic quadtree segmentation structure of large coding units. NEW YORK—Internet video has become a practical medium for the delivery of video content to consumers. What has made this possible is the development of video compression, which lowers the enormous amount of bandwidth required to transport video to levels practical with most Internet connections. In this article, we’ll examine some of the technical and business issues associated with two video codec frontrunners: HEVC and VP9.
HEVC, for “High Efficiency Video Coding,” is also called MPEG-H Part 2 and ITU-T H.265. It represents a state-of-the-art video compression standard that provides about a 50 percent bitrate savings over H.264/MPEG-4 AVC, which in turn provided a similar efficiency over its MPEG-2 predecessor.
AVC solutions have already become widespread in many professional and consumer devices. HEVC, having been ratified by ISO and ITU in 2013, is similarly growing in the same applications, and would appear to be on the road to replacing the earlier codecs. But while MPEG and HEVC have been developed by standards committees representing a legion of strong industrial players, other forces have sought to displace their primacy, most notably, Google, with its VP9. Let’s look at the toolkit of each codec.
HEVC incorporates numerous improvements over AVC, including a new prediction block structure and updates that include intra-prediction, inverse transforms, motion compensation, loop filtering and entropy coding. HEVC uses a new concept called coding units (CUs), which sub-partition a picture into arbitrary rectangular regions. The CU replaces the macroblock structure of previous video coding standards, which had been used to break pictures down into areas that could be coded using transform functions. CUs can contain one or more transform units (TUs, the basic unit for transform and quantization), but can also add prediction units (PUs, the elementary unit for intra- and inter-prediction).
While AVC improved on MPEG-2 by allowing multiple block sizes for transform coding and motion compensation, HEVC coding tree blocks can be either 64x64, 32x32, 16x16 or 8x8 pixel regions, and the coding units can now be hierarchically subdivided, all the way down to 4x4 sized units. The use of tree blocks allows parallel processors to decode and predict using data from multiple partitions—called wavefront parallel processing, which supports multi-threaded decode.
Because this new coding structure avoids the repetitive blocks of AVC, HEVC is better at reducing blocking artifacts, while at the same time providing a more efficient coding of picture details. HEVC also specifies several planar and DC modes, which reconstruct smooth regions or directional structures in a way that hides artifacts better. An internal bit-depth increase allows encoding of video pictures by processing them with a color depth higher than 8 bits.
Motion compensation is provided with two new methods, and luma and chroma motion vectors are calculated to quarter- and eighth-pixel accuracy, respectively. A new deblocking filter is also provided, which operates only on edges that are on the block grid. After the deblocking filter, HEVC provides two new optional filters, designed to minimize coding artifacts.
VP9 IMPROVES ON VP8
With YouTube carrying so much video content, it stands to reason that the service’s parent, Google, has a vested interest in not just the technology behind video compression, but also in some of the market considerations attached therein. To that end, VP9 has been developed to provide a royalty-free alternative to HEVC.
Many of the tools used in VP9 (and its predecessor, VP8) are similar to those used in HEVC—but ostensibly avoid the intellectual property used in the latter. VP9 supports the image format used for many Web videos: 4:2:0 color sampling, 8-bit color depth, progressive scan, and image dimensions up to 16,383x16,383 pixels. It can go well past these specs, however, supporting 4:4:4 chroma and up to 12 bits per sample.
Figure 2. VP9 supports superblocks that can be recursively partitioned into rectangular blocks. The Chromium, Chrome, Firefox and Opera browsers now all support playing VP9 video in the HTML5 video tag. Both VP8 and VP9 video are usually encapsulated in a format called WebM, a Matroska-based container also supported by Google, which can carry Vorbis or Opus audio. VP8 uses a 4x4 block-based discrete cosine transform (DCT) for all luma and chroma residual pixels. The DC coefficients from 16x16 macroblocks can then undergo a 4x4 Walsh-Hadamard transform. Three reference frames are used for inter-prediction, limiting the buffer size requirement to three frame buffers, while storing a “golden reference frame” from an arbitrary point in the past.
VP9 augments these tools by adding 32x32 and 64x64 superblocks, which can be recursively partitioned into rectangular blocks, with enhanced intra- and intermodes, allowing for more efficient coding of arbitrary block patterns within a macroblock. VP9 introduces the larger 8x8 and 16x16 DCTs, as well as the asymmetric DST (discrete sine transform), both of which provide more coding options.
Like HEVC, VP9 supports sub-pixel interpolation and adaptive in-loop deblocking filtering, where the type of filtering can be adjusted depending on other coding parameters, as well as data partitioning to allow parallel processing.
COMPARING HEVC WITH VP9
As you would expect, performance depends on who you ask. Google says VP9 delivers a 50 percent gain in compression levels over VP8 and H.264 standards while maintaining the same video quality. HEVC supporters make the same claim, which would put VP9 close to HEVC in quality. But some academic studies show that HEVC can provide a bitrate savings of over 43 percent compared to VP9. Why the disparity? One likely reason is that using different tools within each codec can yield widely varying results, depending on the video material. The other is that, despite some labs having developed objective tools to rate image quality, the best metric is still the human visual system, which means that double-blind subjective testing must be done, and that will always have statistical anomalies.
But another important factor must be considered as well, and that’s complexity. While both HEVC and VP9 demand more computational power at the decoder, the required encoding horsepower has been shown to be higher (sometimes more than 10 times) for HEVC in the experiments where it outperformed VP9 on bitrate.
There’s a strong motivation for advancing an alternative to HEVC: VP9 is a free codec, unencumbered by license fees. Licenses for HEVC and AVC are administered by MPEG LA, a private firm that oversees “essential patents” owned by numerous companies participating in a patent pool.
Earlier this year, MPEG LA announced that a group of 25 companies agreed on HEVC license terms; an AVC Patent Portfolio License already provides coverage for devices that decode and encode AVC video, AVC video sold to end users for a fee on a title or subscription basis, and free television video services. Earlier, MPEG LA announced that its AVC Patent Portfolio License will not charge royalties for Internet video that is free to end users (known as “Internet Broadcast AVC Video”) during the entire life of the license; presumably, this means the life of the patents.
Last year, Google and MPEG LA announced that they had entered into agreements granting Google a license to techniques that may be essential to VP8 and earlier-generation VPx video compression technologies under patents owned by 11 parties. The agreements also grant Google the right to sublicense those techniques to any user of VP8, whether the VP8 implementation is by Google or another entity. It further provides for sublicensing those VP8 techniques in one next-generation VPx video codec.
So, while there is no license fee required to use VP8, there are other terms imposed—a so-called FRAND-zero license—and users may need a license to fully benefit from the Google-MPEG-LA agreement. One result of the agreements is that MPEG LA decided to discontinue its effort to form a VP8 patent pool.
Apparently, VP9 is a further attempt to provide a shield against the MPEG patent owners, by using elements thought to evade existing granted patents. But HEVC has already made inroads into commercial hardware and software, following on the heels of the already widespread MPEG-4/AVC rollout, and this could make an uptake of VP9 difficult. And even the best intents of the VP8/VP9 developers can be subverted: it’s always possible that a “submarine patent” could emerge, with its owner claiming infringement.
This has already happened, with smartphone maker Nokia suing HTC over its use of the VP8 codec. In this particular case, a court in Mannheim, Germany, ruled that VP8 does not infringe Nokia’s patent—but the possibility always exists of another challenge. While the specter of another contest could be enough to give some manufacturers pause, tilting support toward the “safer” HEVC, it could just as well be subject to some other submarine patent.
A final note: Google has announced that development on VP10 has begun, and that after the release of VP10, they plan to have an 18-month gap between releases of video standards.
Aldo Cugnini is president of AGC Systems, a video and audio technology consulting firm, and is a well known expert in the digital media delivery industry.