Part 2: Next-Gen Audio Coding

SAN FRANCISCO—Last month's article discussed the theory behind high-end audio coding. The second half of this series will define broadcast quality and tandem coding losses, and look at their effect on perceived quality. (To read part one of this article series on understanding the theory behind high-end audio coding, go to Next-gen audio coding - Part 1.)

Being familiar with standardized test methodologies and knowing how to interpret their results will significantly aid in understanding broadcast quality. Unknown to many, there is an ITU recommendation that defines the requirements for audio coding systems in digital broadcasting. ITU-R BS.1548-1 (User Requirements for Audio Coding Systems for Digital Broadcasting, Annex 2) states that an audio codec (and the bit rate chosen) requires mean values consistently higher than 4.0 on the BS.1116-1 five-grade scale at the reference listening position. (See Table 1.)

Remember, a score of 4.0 on the BS.1116-1 scale is also equivalent to a diffgrade score of -1.0. Hence, looking at the results of the two audio coding systems discussed in the revious article (at the data rates tested), only the first system met the ITU-R criteria for broadcast quality. (See Figures 1 and 2.)

For a familiar example of what broadcast quality sounds like, consider that most Region 1 SD Hollywood DVD movies provide a decent benchmark for a codec being operated at a data rate that yields broadcast quality. However, high-definition DVDs typically use audio data rates at least two times higher than the rate of standard-definiton DVDs, and some even use a lossless audio codec. Therefore, with many broadcasters and next-generation service providers under increasing pressure to lower audio bit rates, the perceived quality between some next-generation broadcast systems, services and disc-based media (such as blue-laser DVD, for example) may be quite different in the near future.

Here are a few further items to look for with a properly administered and documented listening test:

  • A graphical presentation of the test results
  • General information about the audio coding system used to process the test material
  • A specification for selecting test subjects as well as test materials
  • Physical specifications of the listening environment, equipment, room dimensions, acoustic properties of the listening environment and transducer types/placement
  • Detail regarding the analysis of the processed data
  • A detailed basis for all conclusions
  • Details of the test design, as well as the training process (instruction to test subjects)

Be wary if the test administrator, test facility, research facility or codec manufacturer cannot provide a complete set of supporting documentation regarding the details of its test and the basis for its results.

Here is a high-level overview of what to listen for when evaluating a new coding system at a number of data rates.

The first thing to listen for is pre-echo, which is a type of impairment that affects (dampens) the sharpness and clarity of signals that are transient in nature. (As a side note, castanets are typically used to determine a codec's ability to handle transient signals across a number of data rates.)

Another type of common coding artifact is related to changes in timbre at higher frequencies and can sound similar to birds chirping (sometimes called birdies). This is most often caused by running a codec at too low of a data rate for spectrally demanding content.

Listen for a gritty or grainy sound quality, loss of bandwidth (typically in the high frequency region) — since many coding systems limit the coded audio bandwidth at aggressive (low) data rates — and image shifts with stereo or multichannel material. Many modern audio coding systems have a mechanism for synthesizing high frequency energy in the decoder from information that was generated and carried in the bit stream from the encoder.

Tandem coding losses, which affect perceived quality, occur when the coding errors in each system (used in tandem) combine to generate larger errors — that is, new errors created in addition to the old ones. These types of errors occur for several reasons, including:

  • Quantization levels in one audio coding system do not map to the same levels in another.
  • Different filter banks are used in the systems.
  • There are time delays between the systems.
  • There are changes in signal amplitude between the systems.
  • Perceptual models are used.

To demonstrate the effect tandem coding losses have on perceived quality, consider Figures 3 and 4, which are both based on analysis performed in the lab with critical material. The results provide an approximation of the magnitude tandem coding losses have on perceived quality. Figure 3 shows a comparison between data rate and audio quality/relative coding error for two next-generation audio codecs available today. The x-axis indicates the data rate as a percentage of the data rate required for codec A to be at broadcast quality (as per ITU-R BS.1548-1). Codec B is more efficient where broadcast quality is about 80 percent of the data rate of codec A. If you were to operate codec B at a data rate 50 percent below the data rate required for broadcast quality with codec A, the quality would drop significantly to between “poor” and “fair.”

Many new codecs are designed for emission applications requiring the highest quality at the lowest data rate for only a single generation of encoding and decoding (that is, from the emission point to a viewer's home). These codecs are not designed for applications where different coding systems are operating in tandem (cascade) with each other (which is becoming commonplace throughout today's broadcast chain).

Figure 4 shows the effect of cascading the two different codecs versus single-generation performance. By placing the more efficient codec B in tandem with codec A (where A is being operated at a rate that is considered to be broadcast quality), the decrease in quality is significant (as shown by the curve with open circles). This is true even when both codecs are independently operating at data rates that yield broadcast quality.

As a real-world application example, consider an IPTV operator that is required to decode a broadcaster's signal that was originally encoded at a data rate that yields broadcast quality (codec A). The operator then needs to re-encode it into a more efficient format (codec B) for carriage to subscribers. Also assume that the IPTV operator chooses a bit rate (codec B) that yields broadcast quality (when codec B is in a standalone application (i.e. the source audio has never been through an audio coding system).

This data rate offers a 20 percent increase in efficiency over the bit rate that codec A needs to achieve broadcast quality. Note where each codec intersects the broadcast-quality threshold in Figure 4; codec B intersects broadcast quality at just below 80 percent of data rate required for codec A.

However, the result of both of these codecs operating in tandem with each other is described by the open circle curve in Figure 4. When codec B is operated as just described (20 percent lower bit rate than A), the net quality of both systems in tandem drops to between “good” and “fair.” Hence, true broadcast quality is no longer achievable in an application like this. Furthermore, many next-generation systems are looking to take the more efficient codecs' bit rates down to 50 percent of the data rate of codec A. Figure 4 shows that the tandem net quality when the bit rate of the more efficient codec (codec B) has been dropped to operate around 50 percent below the rate of the other codec (codec A). In this combination, yet a further drop in quality takes place to between “poor” and “fair.”

Before deciding on a target bit rate for a next-generation audio coder, consider that the realized efficiency gains of any new audio codec may be reduced in practice and will vary widely based on the application. This stems from the fact that in most, if not all cases, the new codec will be used in some portion of the distribution path to at least a portion of (or even all) of the viewers and will be in tandem with one or several different audio codecs. The advertised efficiency gains and quality some of these next-generation audio coding systems promise on paper will be different from what happens in real-world applications.

Before implementing any new coding system, audit the signal paths to quantify the number of codecs in tandem (don't forget that some broadcast servers use an audio codec too). Ask the experts lots of questions about steps to minimize tandem coding loss, and ask for suggestions on bit rates to minimize quality loss. The key: There is no substitute to the human ear. Listen carefully, and think about the level of quality your service requires. Also, does the new system cause any compatibility issues for the viewers?

Jeffrey C. Riedmiller is senior broadcast product manager for Dolby Laboratories.
Impairment Grade Imperceptible 5.0 Perceptible, but not annoying 4.0 Slightly annoying 3.0 Annoying 2.0 Very annoying 1.0

  • ITU-R BS.1116-1, Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems
  • ITU-R BS.1534-1, Methods for the Subjective Assessment of Intermediate Quality Level of Coding Systems (MUSHRA)
  • ITU-R BS.1548-1, User Requirements for Audio Coding Systems for Digital Broadcasting
  • ITU-R BS.1284-1, General Methods for the Subjective Assessment of Sound Quality
  • Grant, Davidson and Fielder, “Subjective Evaluation of an Audio Distribution Coding System,” Audio Engineering Society Convention Paper 5443, September 2001
  • “Perceptual Audio Coders: What to Listen For,” CD-ROM, Audio Engineering Society
  • Tech 3253 - Sound Quality Assessment Material, European Broadcasting Union;