Advanced Audio Coding Moves Forward
July 7, 2004
The ATSC has published two new Candidate Standards documenting enhancements to the AC-3 digital audio compression standard and specifying its use in digital television. This is a major step toward using advanced audio coding for a wide variety of applications.
Enhanced AC-3 will provide the industry with expanded audio capabilities that can be used for broadcast, cable, satellite and DVD applications. It is another example of the continuing efforts to evolve ATSC standards to respond to marketplace requirements.
ATSC first standardized the AC-3 digital audio system in November 1994. AC-3 (known in the marketplace as "Dolby Digital") is now widely used in digital television systems around the world. The enhancements to AC-3 (E-AC-3), which will be marketed as "Dolby Digital Plus," are in two new documents:
CS/T3-613, which adds technical specifications to the ATSC Digital Audio Compression Standard (A/52) that can be used with a variety of media. The document details features that could be relevant to ATSC television systems, and also specifies features likely to be used in other (non-ATSC broadcast) applications. These features are being documented in A/52 because that standard is the fundamental source document for AC-3 and is relied upon by other (nonbroadcast) industries. Including additional features in the Enhanced AC-3 specification will enable its use in other applications, indirectly benefiting the ATSC digital television system.
CS/T3-614, which describes additions to the ATSC DTV Standard (A/53) that specify use of E-AC-3 in the Enhanced VSB (E-VSB) robust transmission mode currently under development in ATSC. The E-VSB mode would allow broadcasters to trade off throughput for robustness. With an E-VSB transmission, some of the approximately 19.4 Mbps data is allocated to the robust mode and the rest is allocated to the normal 8-VSB mode. The robust mode symbol stream includes additional forward error correction bits to improve reception under weaker signal and stronger multipath (ghost) conditions.
Enhanced AC-3 was submitted to the ATSC for consideration by Dolby Laboratories in response to a Request for Information published in December 2002. E-AC-3 offers new coding tools that fundamentally improve performance, as well as new features that allow operation over a wider range of bit-rates and numbers of channels. Of great importance to the industry, E-AC-3 can be converted into AC-3 for playback compatibility on consumer's existing A/V decoders.
Enhanced AC-3 builds upon the current version of AC-3 specified in ATSC Standard A/52A. All decoders for the enhanced version will also decode all legacy A/52 AC-3 bitstreams. In addition, although the new enhanced audio format is not directly compatible with current A/52 decoders, it is feasible to perform a modest-complexity conversion into a compliant A/52 bitstream syntax, thus enabling backwards compatibility to legacy decoders that have S/PDIF bitstream inputs. This capability is critical to support the 20 million 5.1-channel Dolby Digital decoders now in the U.S. market. (There is already a large installed base of home theater systems incorporating multichannel sound, more than 30 percent of U.S. households according to a CEA survey in January 2003). This compatibility, in fact, was one of the key deciding factors on the part of ATSC contributors in selecting this system. Important technical capabilities of Enhanced AC-3 that relate directly to ATSC broadcast applications include:
Expanded data rate flexibility: E-AC-3 allows the number of blocks per sync frame and the number of compressed data bits per frame to be adjusted to achieve significantly more data rate flexibility than standard AC-3, including a greater maximum theoretical data rate and finer data rate granularity.
Spectral extension: Enhanced AC-3 decoders support a new coding technique called spectral extension. Like channel coupling, spectral extension codes the highest frequency content of the signal more efficiently. Spectral extension recreates a signal's high-frequency spectrum from side data transmitted in the bitstream that characterizes the original signal, as well as from actual signal content from the lower-frequency portion of the signal. Because it may be desirable in some circumstances to use channel coupling for a midrange portion of the frequency spectrum and spectral extension for the higher-range portion of the frequency spectrum, spectral extension is fully compatible with channel coupling. Both tools can be enabled at the same time, for different portions of the frequency spectrum.
Transient pre-noise processing is an optional decoder tool that improves audible performance through the substitution of audio segments just before transients to reduce the duration of pre-noise distortions. This technique is called time-scaling synthesis, where synthesized PCM audio segments are used to eliminate the transient pre-noise, thereby improving the perceived quality of low bit-rate audio-coded transient material. To enable the decoder to efficiently perform transient pre-noise processing with no impact on decoding latency, transient location detection and time-scaling synthesis analysis is performed by the encoder and the information transmitted to the decoder. The encoder performs transient pre-noise processing for each full bandwidth audio channel and transmits "helper" information once per frame, only when necessary (for example, when transients are present that will benefit from the technique).
Adaptive hybrid transform processing: In 1995, the transform employed in A/52 AC-3-based on a modified discrete cosine transform (MDCT) length of 256 frequency samples-provided a reasonable tradeoff between audio coding gain and decoder implementation cost. With continuing advances in silicon manufacturing processes over the years, the integrated circuit complexity that constitutes a reasonable level has now increased. This increase in chip performance provides an opportunity to improve the coding gain of AC-3, and hence perceptual audio quality at a given bitrate, by increasing the length of the transform. This is accomplished through use of the Adaptive Hybrid Transform (AHT), which adds a second transform in cascade in order to generate a single transform with 1,536 frequency samples.
Enhanced coupling. This is a new tool that improves the imaging properties of coupled signals by adding phase compensation to the amplitude-based processing of conventional coupling. Prior to downmixing the coupled channels to a single composite signal, the encoder derives both amplitude and additionally interchannel phase information on a sub-band basis for each channel. The phase information includes a decorrelation scale factor as a measure of the variation of the phase within a frame. This sidechain information is transmitted to the decoder once per frame. The decoder uses the information to recover the multiple output channels from the composite signal using a combination of both amplitude scaling and phase rotation. The result is an improvement in soundstage imaging over conventional coupling. This improvement allows the technique to be used at lower frequencies than conventional coupling, thus improving coding efficiency.
Additional features of E-AC-3 of particular interest to applications outside of DTV include:
Channel and program extensions: The enhanced AC-3 bitstream syntax allows for time-multiplexed substreams to be present in a single bitstream. With this capability, the enhanced AC-3 bitstream syntax enables a single program with greater than 5.1 channels, multiple programs of up to 5.1 channels or a mixture of programs with up to 5.1 channels and programs with greater than 5.1 channels to be carried in a single bitstream. These extra channels do not affect a two- or 5.1-channel decoder in ATSC broadcast applications.
Sample-rate processing: Additional metadata is reserved for applications that involve source material sampled at two times the nominal rate, such as 96 kHz and 88.2 kHz.
Mixing control processing: Additional metadata is reserved for applications that involve the mixing of two program streams. These applications require control of the mixing process and resultant dynamic range control metadata; this feature reserves data capacity to accomplish this task.
The Enhanced-AC-3 Candidate Standard specifications can be found on the ATSC Web site, specifically:
- CS/T3-613, which documents revisions to the ATSC Digital Audio Compression Standard (A/52) and can be used with a variety of media.
- CS/T3-614, which describes additions to the ATSC DTV Standard (A/53) that specify use of E-AC-3 in the E-VSB robust mode currently under development in ATSC.
These new Candidate Standards complement three previously published Candidate Standards relating to E-VSB:
- CS/T3-608 and CS/T3-609 document transport stream specifications for the use of advanced video codecs in the proposed E-VSB mode.
- CS/T3-606 specifies changes in the ATSC PSIP Standard (A/65) for use with E-VSB.
The candidate standard stage recognizes that a specification has reached a level of technical maturity that would benefit from implementation experience and technical feedback. After the candidate standard period ends, the document typically moves on to the next approval stage on its way to becoming an ATSC standard.
Candidate standards, along with all other ATSC standards, recommended practices, Implementation Subcommittee findings and related informational documents, are available at no charge from the ATSC Web site, www.atsc.org
Jerry Whitaker can be reached at [email protected]. Background technical information for this article was contributed by Dolby Laboratories.