HEVC Audio: Based on the Past, Headed for the Future

ALEXANDRIA, VA.—When we discuss video compression such as MPEG-2 and H.264, most of us tend to think of the video aspects and don’t think much about the audio. Sure, audio is important, but it’s just… there.

Now that there is a strong push to move beyond MPEG-4/H.264 compression now used for Blu-ray disks and many camcorders, we should spend a little time looking at the audio features of the next-generation of video encoding. The most likely next-generation codec for widespread use is HEVC (high efficiency video coding), which is also known by its ITU designation of H.265. Keep in mind that this is a video codec, and not an audio codec. The audio encoding that will accompany HEVC is being worked on by different teams than those working on HEVC/H.265.

There is a competitive compression standard being developed by Google called VP9, which will be built into many Web browsers. Available with no royalty payments, Google’s vision for VP9 is that it will have better performance both in terms of encoding efficiency and image quality as compared to HEVC/H.265. Still, H.265 looks to be where professional and broadcast video are going in the next couple of years, despite the fact that royalty payments are associated with the standard.

There is yet another next-generation video codec on the near horizon, called Daala, being developed by the Xiph.Org Foundation and Mozilla Corp. The founder of Xiph.Org has stated that the performance of Daala should be a generation beyond HEVC and VP9, but an initial release is not expected in 2015. Interestingly, the Xiph.Org Foundation is the creator of FLAC (free lossless audio codec), which is well-regarded for its audio performance.

TWICE AS EFFICIENT

From a video standpoint, H.265 is roughly twice as efficient as H.264, which was itself about twice as efficient as MPEG-2. In other words, a video stream encoded at 20 Mbps with MPEG-2 would require about 10 Mbps with H.264 and about 5 Mbps with H.265. That’s a little oversimplified, but it’s a useful rule of thumb.

MPEG-2 introduced most of us to the term MP3 for audio coding. Unveiled with MPEG-1 compression in the early 1990s, MP3 stands for MPEG Audio Layer III. It has become a popular audio compression standard, although there are many others also in use simultaneously. Like the parent video compression standard, MP3 is a “lossy” form of compression, meaning that it changes the audio in order to achieve its compression and those changes can’t be undone once they are compressed.

MP3 has a wide range of settings that have an effect on the final audio quality, including sampling rate and bit-rate. Mainstream MP3 can be sampled at 32, 44.1 and 48 kHz rates, and can be encoded at bit-rates ranging from 56 to 384 kbps. At 128 kbps and 44.1 kHz sampling, an MP3 file takes up about 9.1 percent of an uncompressed CD recording. Encoding an MP3 file at a bit-rate of 320 kbps will create a bitstream that’s about 23 percent the size of an uncompressed CD recording.

Advanced audio coding (AAC) was developed after MP3 and takes advantage of what was learned from that initial popular format. AAC generally provides better sound quality than MP3 at similar bit-rates. AAC also has an offshoot known as high efficiency advanced audio coding (HE-AAC), which is used for mobile television standards such as DVB-H and ATSC-M/H. Like MP3, AAC is a lossy compression format, and it has a range of settings similar to MP3.

Dolby Digital and AC-3 are two names for the same format of audio processing. Developed by Dolby Laboratories, AC-3 is sometimes referred to as “audio codec 3” or “advanced codec 3.” All forms of AC-3 support surround sound, with the initial version carrying 5.1 channels and the later Dolby Digital Plus handling 7.1 channels. An enhancement to Dolby Digital Plus called E-AC-3 can carry up to 13.1 channels. The greater coding efficiency of E-AC-3 means that it can provide reasonable 5.1-channel audio in a 256 kbps stream.

NEXT-GEN AUDIO FORMATS

The primary audio coding formats associated with HEVC/H.265 are MPEG-H and AC-4, and may also include other codecs over the ensuing months. MPEG-H can be thought of as “AAC on steroids,” and last year the ATSC announced that MPEG-H 3D audio was one of the three standards proposed for the audio system of ATSC 3.0. In its simplest form, MPEG-H will support eight channels of audio. It has many other features, including the ability to provide loudness metadata.

Dolby AC-4 is likewise a considerably advanced codec that evolved from AC-3. Compared to AC-3, AC-4 improves compression efficiency for broadcast by about 50 percent. AC-4 is already standardized with the European Telecommunications Standards Institute and adopted by the Digital Video Broadcasting Project, the U.K. standards body. The standard features native support for dialogue enhancement, intelligent loudness and advanced dynamic range control, as well as more efficient support for multiple languages and descriptive services.

The interaction of these audio codecs with HEVC is still being worked out and will become part of a final ATSC standard here in the United States. At the recent ATSC Boot Camp in Washington, Jim Starzynski of NBC gave a presentation on the status of MPEG-H and what we can expect in the future.

Like video codecs, audio codecs are becoming more efficient at compressing audio into smaller streams. This will enable future broadcast codecs to provide more channels of audio and provide broadcasters with more choice regarding quantity vs. quality trade-offs.