Skip to main content

Raising the standard

The MPEG-4 family of audio and video coding standards and related technology has been a huge enabler in driving forward the growth of audio and video delivery across multiple platforms.

While the MPEG-4 standard actually contains 23 different parts, the part most widely referred to now in terms of content delivery is MPEG-4 Part 10 (ISO/IEC 14496-10) — also known as MPEG-4 AVC and H.264. The H.264 reference gives a clue to the standards lineage coming from the ITU Telecommunication Standardization Sector (ITU-T). The broad base of support from the Moving Pictures Expert Group and ITU during the standard's development by the Joint Video Taskforce has led to widespread acceptance and rapid adoption of the standard between vendors and operators following its initial introduction in 2003. In this article, MPEG-4 refers to MPEG-4 Part 10 because despite potential confusion with the prior MPEG-4 ASP standard, the newer standard is becoming the de facto meaning.


One of the major benefits of MPEG-4 is that it is an open standard that has been adopted by organizations across the globe, including — notably for TV applications — the DVB and SMPTE. The high level of interoperability that this provides — along with the extent of its use — has led to rapid development that may not have occurred if a range of proprietary encoding solutions had been developed in its place. Many products and devices now support MPEG-4.

Just like its forebear MPEG-2, MPEG-4 includes multiple toolkits and a range of complexity to allow developers to improve video and audio quality while reducing the necessary transmission bandwidth. There is huge potential to continue to improve MPEG-4 in terms of the implementation of encoders in particular, beyond the level of sophistication available today. The evolution and improvement of the standard is mimicking the development of MPEG-2 10 years ago but at a far greater pace, driven by the wider understanding and larger number of organizations that have adopted the new standard.

The fact that MPEG-4 performance upgrades can (on some makes of encoders) consist purely of software is a great asset for developers and solution providers who can offer ongoing performance improvements through field upgrades. The capital expenditures (CAPEX) benefits of investing in hardware that can support such upgrades are clear. (See Figure 1.)


One of the main contributors to MPEG-4's success is the need to make the most efficient use of the available bandwidth for contribution, distribution or final transmission. For each delivery platform, however, the bandwidth scenario is slightly different.

In the case of satellite operators, many providers are trying to differentiate themselves from their competitors by providing high-quality HD services. Despite having a comparatively large amount of available bandwidth that can be increased by adding additional transponders, the satellite business model demands a relatively large number of HD channels to deliver more than any other broadcaster, providing competitive advantage and increased average revenue per user (ARPU).

Using an MPEG-2 solution, running at between 15Mb/s to 20Mb/s only allows a maximum of two HD channels on a typical transponder, which does not make commercial sense. However, by using MPEG-4, the satellite operator can provide about six channels per transponder, with the likelihood of increasing to eight or 10 channels per transponder as MPEG-4 technology evolves in the near future and when combined with the DVB-S2 transmission standard.

IPTV operators also have a strong interest in making the best use of their available bandwidth, but for different reasons. Most IPTV operators use DSL or ADSL 2+ technology, which introduces bandwidth limitations when the exchange (DSLAM) is far away from the user's home. In order to provide acceptable reach for delivering SD and especially HD services, the operator needs to keep the bandwidth requirements as low as possible while maintaining an acceptable quality of service. By using MPEG-4, the operator can reduce the bit rate substantially while maintaining the same level of quality and significantly increasing the number of users that can receive the service without the need for additional network CAPEX. This can be the difference between a working business model and a failing one.

For DVB-T, most transmissions are in the range of 20Mb/s to 24Mb/s. Therefore, achieving HD using MPEG-2 at around 15Mb/s to 20Mb/s would involve allocating one HD service per channel, which is a highly inefficient use of the available spectrum. To offer multiple HD services, a reduction in bit rate is required, and implementation of MPEG-4 allows at least three HD services per channel, with the possibility of four in the near future.

There is also a driver for cable operators who, while theoretically have more bandwidth than IPTV and DVB-T, are increasingly using that bandwidth for complementary services such as data and VOD to provide competitive advantage and increased ARPU. Consequently, the available bandwidth for high-value services such as HD has been reduced, necessitating an improvement in the encoding technology being used.

MPEG-4 technology is not only applicable to final transmission, and many content owners (e.g. programmers) have adopted the standard for their contribution and primary distribution due to the benefits of lower bandwidth requirements, or higher quality for a similar bandwidth to MPEG-2. This attribute provides some much-needed flexibility for network operators who can now make a choice between minimizing bandwidth and maximizing quality to best service their particular business model.

While making the best use of the available bandwidth is still a key issue for many operators, some have introduced a self-imposed quality threshold of about 10Mb/s to maximize customer experience, especially for HD services. At the other end of the scale, operators with severely limited bandwidth such as IPTV over xDSL are delivering streams at below 5Mb/s.

In addition to the reduction in bit rate compared with MPEG-2, MPEG4 provides significant improvements in terms of video quality. The ringing and blocking artifacts sometimes visible with MPEG-2 encoding are considerably reduced or smoothed in a way that is far more pleasing to the human eye. With skilled use of the available toolsets, it is possible to remove the majority of the artifacts completely. The bit rate does, however, have a huge bearing on the video quality. Some of the more sophisticated techniques that can improve MPEG-4 video quality include: dynamic GOP, hierarchal GOP, support for high profile and single-slice architecture.

Dynamic GOP

This technique dynamically decides the GOP size and structure, or how the types of frames used for encoding are divided. Typically, a GOP starts with an I-frame and is then followed by B-frames and P-frames before the GOP starts over with the next I-frame. The I-frame is not referenced to any other frame and is not based on prediction, while the B-frames and P-frames are referenced to other frames and use motion estimation across the frames. When dynamic GOP is used, the encoder selects between three hierarchical GOP structures — for example one with three B-frames, one with seven B-frames and a third with 15 B-frames.

The selection is content-dependent, meaning it uses motion among other characteristics, and the GOP length must be a multiple of 16. When dynamic GOP is not used, either a hierarchical GOP structure (with three B-frames) or an MPEG-2-like GOP structure with two B-frames is selected. In the case of the hierarchical GOP structure, the GOP length must be a multiple of four. In the case of the MPEG-2-like GOP structure, the GOP length must be a multiple of three.

Hierarchical GOP

In MPEG-2, a typical GOP consists of one I-frame and several P- and Bframes, where each B-frame uses the nearest past I- or P-frame and the future P-frame as references for prediction. B-frames are not used as references. (See Figure 2.)

H.264 removes this major restriction, providing the encoder the flexibility to choose whether to use Bframes as references for prediction. It is now possible to use only one Iframe and multiple B-frames in a GOP, where some B-frames would act as reference frames, or hierarchical GOP.

Figure 3 and Figure 4 illustrate a hierarchical GOP with seven B-frames. Figure 3 illustrates the GOP in the typical display order and decoding order as we normally illustrate an MPEG-2 GOP. Figure 4 more clearly shows the hierarchy of the GOP, but it's actually the same GOP as in Figure 3.

The B-frame of the level one (B1) uses only the surrounding I/P references for prediction. The Bframes Bi of level i > 1 can use the surrounding I/P references as well as the Bframes Bj with the level j i that are located at the same group of frames for prediction.

Support for high profile

High profile is an extension of the original MPEG-4 Part 10 standard that allows solution providers and developers to employ more sophisticated tools to improve encoding. One example is the use of dynamically different sizes of blocks within the frame. This is beneficial for use with HD as the use of bigger block sizes allows the preservation of more detail on an HD picture. The key to a good encoder is to dynamically select the right block sizes and the right encoding tools for the specific content being encoded.

Single-slice architecture

Single-slice encoding architecture yields significantly better video quality than that achieved by multiple-slice architectures for several reasons.

First, rate control is inherently more effective because it has easy access to the statistics of the whole picture.

Second, not having to reset statistics at the beginning of each slice within a picture maintains high context-based adaptive binary arithmetic coding (CABAC) efficiency. In addition, loop filtering, which cannot be applied across multiple slice boundaries, is applied to the full picture, significantly reducing blocking artifacts.

Third, the encoder does not have to perform motion compensation across slice boundaries, which is needed to avoid unnecessary blocking artifacts that would have been generated when two parts (one in each slice) of an object are motion-compensated differently. Obviously, motion compensation across slices requires additional complexity and memory transfer overhead.

While MPEG-2 and MPEG-4 are similar in some regards and share some common management and handling procedures, techniques for the two standards are not totally interchangeable. One area where there is similarity is the ability to use statistical multiplexing to dynamically share bit rate between channels. Conversely, splicing solutions designed for MPEG-2 are not directly useable for MPEG-4. The two splicing techniques that are apparent today for MPEG-4 are either decode/recode (this can be packaged as a single product) or preconditioning the incoming stream.

In an environment where there is the requirement to undertake a splice, for example for ad insertion into an existing MPEG-4 stream, the decode/recode option has drawbacks, including cost and potential quality reduction. Therefore, the best solution is to precondition the incoming feed at the point at which the splice is required.

The splicing device then sets up various parameters, such as buffers and frame types suitable to switch in another channel. This ensures that the splice point occurs at the end of a GOP to coincide with an I-frame and negates the possibility of an unclean splice through trying to link to a motion-estimated frame or broken reference. The local server can then play out the ad, ensuring that parameters such as buffers and frame types are set up in such a way as to minimize visual disturbance for the viewer.

It is important to recognize that despite its superior performance, MPEG-4 will have to coexist with legacy MPEG-2 deployments. In reality, this means using MPEG-2 and MPEG-4 in both constant and variable bit rate streams while still managing the bandwidth to maximize the number of services available. When working with constant bit rate services or variable bit rate services from a single multiplex, it's easy to calculate the maximum amount of services in a stream and manage the network to maximize use of the available bandwidth. (See Figure 5.)

However, mixing variable bit rate streams from different multiplexers can cause manifest problems, as the maximum bit rate cannot be quantified. This leads to overflow and packages getting lost, causing blocking and artifacts. (See Figure 6.) This situation could occur when an operator wants to make a subset of all the services available on the market to supply specific content to its client base. There is a possibility of having to mix variable bit rate streams from different muxes with the inherent problems, but for HDTV, this type of visual degradation is wholly unacceptable.

A proven solution for MPEG-2 involves using high-quality transraters. However, as we've already established, this is not commercially practical for HD MPEG-4 at the present time. Nevertheless, the problem can be resolved by providing a flexible solution using technology available today. An untransrated variable bit rate HD MPEG-4 service can be combined with fully transrated MPEG-2 SD services, producing high-quality video on both MPEG-2 SD and MPEG-4 HD while meeting the bandwidth criteria. (See Figure 7.)


The importance of the MPEG-4 family of audio and video coding standards cannot be underestimated. Not only is it extremely successful in its own right as a cutting-edge technology, but it has also proved to be a powerful enabler for the new generation of content delivery platforms. The success of IPTV and new HD services would not have been possible without the high-quality, low bit rate attributes of MPEG-4 encoding and enhanced quality of experience that it can help provide for the viewer. MPEG-4 technology is still being developed and will continue to evolve for many years to come. Arguably, the best is yet to come.

Tim Sheppard is senior business development manager broadcast at Scientific Atlanta, a Cisco company.