Exploring the AC-3 Audio Standard for ATSC

Last time we discussed the technical details of the BTSC system and some of the pitfalls that broadcasters need to be aware of when passing audio through the system. This time we will get into the technical details of the ATSC audio system, also known as AC-3.

First things first: AC-3 is short-hand for Audio Coder 3, and it is so named because it comes right after AC-1 and AC-2, two other audio coding systems developed by Dolby.

AC-3 - or Dolby Digital as it is commonly known - is a perceptual audio bit rate reduction system capable of efficiently carrying one (mono) to 5.1 channels of audio. The 5.1 channels are comprised of Left, Right, Center, Low Frequency Effects (LFE), Left Surround, and Right Surround; and the LFE channel (used to feed a subwoofer), is the so-called ".1", carrying frequencies only up to 120Hz, or very roughly about one-tenth the bandwidth of the other five channels.

Without bit-rate reduction, these channels would require almost 6 Mbps (6 channels x 48,000 x 20 bits) or almost one third of the total 19.39 Mbps data rate for the ATSC system. By relying heavily on psychoacoustic principles, the Dolby Digital (AC-3) perceptual coder is able to approximately achieve a 12:1 "compression ratio" and encodes the 5.1 channels into a 384 Kbps data stream as required by the ATSC. Recently the ATSC updated the standard to allow 448 Kbps, thereby matching the data rate typically used for Dolby Digital soundtracks on DVD. For more technical details on how this system performs its magic, I again refer you to www.dolby.com/tech/ for a paper titled "AC-3: Flexible Perceptual Coding for Audio Transmission and Storage."

It should be apparent that whether a television station is sending a mono program, a stereo program, or a 5.1 channel program that the audio is Dolby Digital (AC-3) encoded. You might be wondering what happens when a station sends a program with a certain number of audio channels to an audience that has many different receivers with differing numbers of speakers. For example, what happens when a 5.1 channel program is received by that small TV set with a mono speaker?

DOWNMIXING DETAILS

Last time I said that Dolby Digital (AC-3) provides compatibility for television receivers that have fewer speakers than the number of audio channels being delivered and it does it without matrixing. I should further clarify this by explaining that, unlike the BTSC system, which carries the L-R information as a subcarrier that is ignored by mono receivers, the AC-3 system downmixes all of the audio channels to mono inside the mono receiver. Well, all channels are combined except for the LFE channel, which is simply discarded as it is meant to feed a subwoofer. Similarly, a stereo receiver gets all 5.1 channels, but they are downmixed to stereo, again minus the LFE channel.

Figure 1 shows how this allows a single mix of a program to be reproduced by any receiver regardless of how many speakers or channels it has.

(click thumbnail)Figure 1
Downmixing is not simply a fixed process; it is controlled by parameters contained in a metadata stream carried within the AC-3 bitstream. Metadata is data that describes the audio data and we will dig deeper into the subject next time, but for the purpose of downmixing, it is worth noting that the levels at which the Center and Surround channels are combined with Left and Right channels are controllable in the encoder. This can help to prevent the dialogue in the center channel from getting buried by all of the other audio channels when they are combined. In a 5.1 channel receiver, these metadata-controlled downmixing levels are ignored, and the 5.1 channel program is reproduced just as it was mixed.

In Figure 1, you will see the downmix case that feeds a Pro Logic matrix surround decoder which produces Left, Center, Right, and mono Surround channels from a two-channel input. There is a dashed box around it because the Pro Logic decoder can be external for the case of a set top box that has stereo outputs feeding a consumer's existing surround system, or it can be internal as is the case with all 5.1 channel decoders. Yes, all 5.1 channel decoders also have a Pro Logic decoder present. This enables the system to send audio to all 5.1 speakers, thereby producing sound from more speakers than audio channels being transmitted. In Figure 2 you can see a more detailed view of how this would typically be structured in a 5.1 channel decoder.

(click thumbnail)Figure 2
Although the surround signal is mono, it is sent to the left and right surround speakers, and if selected, the low frequency information below 120 Hz is filtered from the left, center, and right channels and sent to the subwoofer. So, the audio is not truly 5.1 channels, but appropriate signals are reproduced from all 5.1 speakers.

This scheme allows smooth integration of legacy mono and stereo material with more recent 5.1 channel programs. It importantly provides for dialogue to be reproduced from the center speaker regardless of how many channels are encoded thereby keeping a consistent sound field for all audio programs. Even non-surround encoded stereo programs can benefit from this decoder, but the consumer can always bypass it if they so choose.

BAD IDEA

Just like the rash of stereo synthesizers that occurred when the BTSC system first hit the scene, there is a similar situation with 5.1 channel audio. I have heard of at least one case where a station has used a matrix decoder of some sort prior to the Dolby Digital encoder so that they could "light the 5.1 light." This is a really bad idea! Remember that your mono and stereo listeners are getting a downmixed version of the 5.1 channel program. Unlike some stereo synthesizers that were mono-compatible, this practice is not. If this program is "faked" into 5.1 using a matrix decoder, there are built-in delays in the surround channels that when summed with the main channels during downmix will produce some nasty artifacts.

However, even worse is the poor listener who has a stereo set-top box connected to his or her home theatre system and is trying to Pro Logic-decode a previously matrix-decoded, then downmixed-program. I have heard the results, and have one word for it: Yuck! Understanding that all audio sent via the ATSC system is Dolby Digital (AC-3) -encoded and that all 5.1 channel receivers contain a Pro Logic decoder to deal with legacy mono and stereo material should prove this practice an unnecessary burden for the station and a disservice to mono and stereo listeners.

That being said, the only caveat to doing things the right way is that the Dolby Digital (AC-3) encoder must be told if the audio program it is encoding is 5.1 channels or two channels. As there is no reliable way to auto-sense this, it requires either operator control (manual or via automation) or better yet Metadata control.

NEXT UP: METADATA

Next time, we will look at audio metadata in greater detail. Not only does it allow for proper downmixing of programs, it also provides tools for program to program loudness matching and contains a unique dynamic range control system that allows viewers to customize the dynamics of the program to their listening environment so they can watch action-adventure movies at 2 a.m. without waking the kids, or if they live far away from neighbors they can watch the same movie with full-throttle dynamics.

I have received quite a number of very nice emails, some containing good suggestions for topics and areas that they would like to see covered. I appreciate all of the feedback and look forward to hearing from more of you. Thanks for your time!