Why is H.264 becoming so pervasive?

H.264 compression, otherwise known as MPEG-4 part 10 Advanced Video Codec, is rapidly becoming a preferred standard for video compression throughout the broadcast industry.

As a fourth-generation codec, H.264 is a popular choice for driving video to mobile devices, with all of the major streaming media vendors delivering at least one product with H.264 compression. Its versatility lends itself to a broad range of applications, but its complexity means setting up and configuring an H.264-based product for optimum performance can be complex, more so than with any of its predecessors. It requires a much higher degree of computing power in both the encode and the decode processes, which limits its use with low-cost, portable devices to the most recent generation.

This article discusses reasons why H.264 is becoming a common compression standard in all phases of broadcasting and how its extremely rapid adoption in the streaming media industry has changed the face of that industry.

What is H.264?

H.264/MPEG-4 part 10 AVC is an advanced compression technology resulting from a collective partnership known as the Joint Video Team (JVT). Made up of representatives from the International Telecommunications Union (ITU) and the Motion Pictures Experts Group (MPEG), JVT first published technically identical ITU-T H.264 and ISO/IEC MPEG-4 Part 10 specifications in 2003. The collaboration virtually guaranteed the codec would be adaptable and have broad acceptance, in both the telecommunications and the broadcast video industries.

The MPEG-4 specification defines 27 different and often independent standards, called Parts, most having some application in the broadcast industry. Some have no design or application relationship to the Part 10 standard. Only Part 10 of the MPEG-4 specification is equivalent functionally to H.264. To help avoid confusion, the industry refers to the standard as H.264 and avoids the more general MPEG-4 name.

H.264 has many parts, called Profiles, each defining a set of capabilities targeting specific classes of applications. Some have been superseded by newer profiles as computing horsepower and other technical achievements happened. The specification is updated frequently as new profiles are defined.

Broadcasters are most interested in the Constrained Baseline Profile, a March 2009 advancement of the original Baseline Profile, used for Internet streaming and streaming to mobile devices. Scalable High and Scalable High-Intra profiles, defined in 2007 as an advancement of the original High Profile, are in video production and some high-bandwidth streaming applications. Recently, H.264 standards were amended to include 3-D (stereoscopic) video, starting with Version 11 (March 2009) and revised again in November 2009. H.264 likely will have a prominent place in the future of 3-D television.

H.264 differs from previous third-generation codecs in several important ways. These differences can be summed up in two areas:

The H.264 codec offers significantly better picture quality at a lower bit rate, which is mostly the result of advances in how frames of digitized video are analyzed, compared with other frames (in some cases both previous and subsequent frames) to detect sameness and differences, and how much prediction can be applied, ultimately to avoid resending data that is unchanged from previous data.
Versatile configuration choices offer absolute control over all the things that must be considered for optimal performance in any particular application. This is important when optimizing compression for bandwidth and resolution-limited devices, as we'll discuss later in this article.

H.264's advantages

The H.264 codec offers many advantages, including the following:

Lower bit rate for the same video qualityThis is important to cable and satellite distribution operators because it can significantly reduce new buildout costs when adding channels. Particularly, HD. H.264 can create a picture comparable to MPEG-2 at half of the MPEG-2 bandwidth, making it attractive to anyone trying to save transmission costs and essential when bandwidth is limited by the available technology.
Better picture at the same bit rateThis is vital to improving viewer acceptance in bandwidth-limited applications, including streaming to mobile devices. In many cases, MPEG-4 is the first opportunity to video-enable a low-bit rate device. This may become less important when we are all on a global 4G-class cell phone network, but that may be awhile.
Fewer technologies in play to reach a bigger audienceIf you are adding alternative distribution forms, chances are the video compression will be based on some variant of H.264. That offers additional potential for lower headend and other infrastructure costs because the same equipment can more easily tailor video streams for different devices.
Better compatibility with more devices over a longer period of timeWidespread adoption means every video device can, or soon will, decode and display some form of H.264-encoded content — at least until something better comes along. By then, whatever follows H.264 will need to be sufficiently revolutionary to cause massive displacement, which is not likely to happen any time soon, especially considering the increasing number of new H.264 products.
Lower deployment costFewer competing standards should lead to less development cost, lower product cost and faster time to market by reducing the need to support several standards. Innovations in both the encoder and player can be developed and deployed faster.

Continue to next page.

Optimizing H.264 for broadcast applications

With flexibility comes complexity, and H.264 is no exception. Setting up an H.264 encoder to fit your application can be as simple as selecting the default template your media encoder offers and pressing the “stream” button. Knowing what to tweak is challenging; H.264 is not a one-size-fits-all technology. One popular H.264 codec library has more than 200 configurable settings.

Fortunately, most H.264 encoding products offer a set of templates. But to get the best possible video playback experience, you will eventually need to tweak the settings under an “advanced” button in the user interface.

Here are a few of the critical configuration options that will become familiar as you embrace H.264:

Constant bit rate (CBR) vs. variable bit rate (VBR)
With CBR, a specified bit rate is held more or less constant, no matter the scene complexity or other factors that periodically spike bandwidth upward. This is pretty much required for streaming to handheld mobile devices because they lack bandwidth headroom and the additional CPU to receive and decode anything more complex. CBR is helpful in live Internet streaming applications using adaptive streaming. Because the player automatically switches back and forth between different streams, CBR helps keep the streams synchronized so the player switches more seamlessly at the same point in the video.CBR is not optimal for quality because it does not allow the level of compression to change dynamically with the degree of motion in the video. Conversely, VBR targets a specified bit rate but presumes additional bandwidth is available to handle spikes. Essentially, more bits are allocated in fast moving scenes and fewer in static scenes. More bits equal more bandwidth. This can be a problem for live Internet streaming but is a good choice for downloaded video.
Macroblock size
Like other codecs, H.264 breaks down a captured video frame into individual rectangles called macroblocks. The motion compression and compensation techniques that make up the bulk of the magic of compression act on each macroblock, ultimately computing or predicting frames based on differences between the target macroblock and neighboring macroblocks. Older codecs had fixed size blocks (usually 16 × 16 pixels), but H.264 lets you select the size for your application. Smaller blocks mean more blocks per frame, which offers better overall picture quality at a significant cost in computing horsepower to sustain real-time (live) encoding. Constant encoding speed doesn't matter where no expectation to compress in real time exists (as when creating files for later playback). For live encoding applications, use the smallest size that does not result in dropped frames or other impairments caused by the inability of the compressor to keep up. Increase the block size if needed to sustain high motion content, at a cost of smoothness and faithful rendering of subtle color differences. Or, accept some degree of blockiness in the resulting video during high motion periods. For streaming to handheld devices, downscale the video ahead of compression, and specify small macroblocks. Most commercial media encoders automatically downscale for you when you select the output frame size.
GOP structure
Group of Pictures (GOP) usually refers to how often the encoded stream is required to insert a full frame rather than continuing a series of predicted frames. Your choice can significantly impact encoder processing overhead. Most encoders have an automatic setting to detect a new full frame to scene change. However, some content, such as news desk content, has relatively few scene changes, and auto may extend the frequency of full frames out several seconds. That may be OK, but remember the player device will not start rendering a picture until it gets its first full frame of video, so it may be several seconds before a user sees your program. Forcing the structure to have a full frame at least every one-and-a-half to two seconds can be important in live streaming applications where the viewer may connect at any moment.

Mark Hershey is vice president of engineering at ViewCast.