Digital LOUDNESS matching

In previous articles, we examined the theory behind loudness estimation as well as the use of the level of dialogue as a metric to establish a subjective level match from any type of content source (analog or digital). Now we turn our attention to an important part of addressing the loudness puzzle — the digital set-top box decoder.

The design of its internal gain structure and its default operating modes combine to make the STB decoder one of the most critical components of the North American DTV and digital cable system. Yet it is often overlooked by many system operators when troubleshooting viewer complaints about loudness. This article will provide a brief overview of how this portion of the system works and what it expects from both digital and analog programming sources — and users — to make it work correctly.

Today many viewers are presented with programming from both analog and digital sources via a single piece of hardware, the STB. As a result, one of the most important and challenging goals of everyone involved in delivering content — including the broadcaster, cable operator and STB manufacturer — is to provide viewers with a seamless listening experience as they switch between digital and analog channels.

At first glance, this level of operability may seem impossible to achieve considering the fact that we are trying to line up two types of signals (analog and digital) that can differ greatly in areas such as available headroom above dialogue peaks and dynamic range. However, with a thorough understanding of the Dolby Digital (AC-3) system and current NTSC practice, and some knowledge of STB internal gain structure, we can be better prepared to address the actual source of the problem in a given situation.

Internal gain structure

First, we must not forget the little-known fact that the internal audio gain structure of digital STBs has been designed with several assumptions about the signals these boxes will receive and provide to the viewer.

The STB decoder assumptions required to provide dialogue level matching between analog and digital sources are as follows:

Many digital STBs (designed for North America) assume that while tuned to an analog NTSC channel (either off-air or via cable), the average dialogue level is about 17dB Leq(A) below 100 percent modulation.
The digital STB assumes that while tuned to a digital service (either off-air or via cable), the transmitted dialnorm value carried in the AC-3 bit stream is correct for that program.
If the listener is using the channel 3/4 remodulated RF output on the digital STB, the AC-3 decoder must default to the RF operating mode1 to provide a match between analog and digital channels.

Therefore, in order for digital programming to match analog programming (at the channel 3/4 RF output on the STB), all of the conditions listed must be met.

To illustrate this point, Figure 1 on page 56 shows the proper level relationships between the output of the Dolby Digital (AC-3) decoder and the NTSC tuner/demodulator for the modulated RF output of a typical digital STB used in North American for DTV and digital cable. In Figure 1, notice that the NTSC scale has a maximum value of 0dBFM and that this level is equivalent to 100 percent modulation as per FCC rules (that is, 25kHz peak deviation), and the Leq(A) dialogue level is shown to be at 17dB below 100 percent modulation. This value (-17dBFM) indicates the ratio between the maximum program peaks and the average Leq(A) dialogue level. Why -17dBFM Leq(A)? Measurements taken over the years have shown that the equivalent loudness of dialogue (A-weighted) for NTSC broadcasts is typically 17dB below 100 percent modulation. Therefore, a properly designed digital STB assumes this condition is always true (for any analog channel) and requires it in order to provide a level match to digital sources. (Note: This article focuses only on analog vs. digital level matching for the RF remodulated output of digital set-top boxes. For further information on other outputs (i.e. baseband and digital), see the references on page 62.)

Set-top decoder modes

For digital sources, the Dolby Digital (AC-3) decoder can typically be operated in two modes: line and RF. In many cases, these modes can be controlled by the viewer. If we refer to the decoder mode selections in Figure 1, we see that the maximum permissible level of +6dBFM (equivalent to 200 percent modulation ▸ 50kHz peak deviation) for Dolby Digital (AC-3) sources is available at the channel 3/4 RF remodulated output.

This level relationship is intentional since the Dolby Digital signal being decoded by the STB potentially has 6dB more headroom above dialogue peaks than NTSC analog audio (that is, while the decoder is operating in RF mode). Furthermore, because the BTSC system leads to a maximum peak deviation of 73kHz, most television tuners today can accept up to 8dB above 25kHz peak deviation in the absence of pilot and subcarriers without distortion.2

Figure 1 gives us the entire story. You can see that with the set-top decoder operating in RF mode, the decoded dialogue level for digital sources will match the dialogue level of analog sources if and only if the analog source has its dialogue level provisioned at about 17dBFM and the dialnorm value within the Dolby Digital bit stream is set properly.3 Also note that if the viewer unknowingly switches the Dolby Digital (AC-3) decoder into line mode, the decoded level of dialogue will be reproduced at -28dBFM Leq(A), or 11dB lower than the analog source! (And it most likely will generate a complaint that will send you on a wild-goose chase to find a problem within your facility, when in reality the STB is in the wrong mode.)

RF mode is intended for products such as DTV receivers and digital cable STBs that generate a signal for transmission via the channel 3/4 remodulator which feed an RF (antenna) input of a television set. This decoder operating mode was specifically designed to match the average reproduced dialogue level and dynamic range of digital sources to those of existing analog sources such as terrestrial NTSC and analog cable TV broadcasts. In this mode, dialogue normalization is enabled and applied in the decoder at all times. However, the dialogue level in this mode is reproduced at a level of -20dBFS Leq(A)4 only when the transmitted dialnorm value is valid for a particular program. Thus, the Dolby Digital decoder introduces a shift gain of +11dB, and therefore the maximum possible peak to dialogue level ratio is reduced by 11dB when compared to line mode.

This leads to an important question: How do you know which operating mode your STB audio decoder is in? First, differences do indeed exist between manufacturers of STBs and program guide application providers (used in cable) as to the nomenclature they use to indicate Line or RF mode operation to the viewer. Table 1 lists the Dolby equivalent modes for the most common digital cable STBs in use today.

A quick glance at Table 1 shows that viewers can easily create a level mismatch between properly transmitted analog and digital programming. There have been a handful of cases over the years where STBs were deployed into viewers' homes defaulted to the wrong operating mode due to a lack of understanding as to what these user-selectable parameters did or their impact on the reproduced audio. Many viewers seem to have based their choices purely on face value. (“Normal or wide mode must be better than narrow mode, right?”) Thus, every customer service or support department should always inquire about decoder operating modes first when taking complaints.

Another point frequently raised by cable operators across North America: “Why do my digital channels sound so quiet compared to analog?” The appropriate response: “How do you know that your analog channels aren't too loud?” In fact, I have found (more times than not) the average dialogue levels on analog channels of cable television systems to be significantly higher than -17dBFM. In these cases, the STB (even when in RF mode) could not provide a match between analog and digital sources. Hence, in these situations, the deviation on the analog modulators must be adjusted so that the average dialogue level is truly -17dBFM Leq(A).

Recommendations

In the end, education and knowledge about the end-to-end system are key to providing great audio programming to viewers.

I will close this article with a few short recommendations:

Provision any NTSC analog audio modulation equipment (off-air or cable) so that the A-weighted average dialogue level is about 17dB below 100 percent modulation.
Digital STBs that include an RF remodulator must default to RF mode. (Don't worry; viewers who want full dynamic range will use the S/PDIF output of the STB to feed their 5.1 home theaters.)
Properly provision the dialnorm value to match the long-term A-weighted dialogue level for programming.

Note: The points made in this article are all referenced in the OpenCableT Host Device Core Functional Requirements (for OpenCable digital STBs), as well as in a bulletin issued by the Electronic Industries Association (EIA) and the Consumer Electronics Association (CEA) titled “EIA/CEA-CEB-11 NTSC/ATSC Loudness Matching.” The EIA/CEA document provides guidance to digital STB manufacturers on how to maintain uniform audio loudness between existing NTSC programming and digital television services while simultaneously preserving the dynamic range capability of the digital services. The bulletin also addresses the capabilities of consumer broadcast products to match loudness from the listener's perspective, internal gain structure and output specifications.

Jeffrey C. Riedmiller is senior broadcast product manager for Dolby Laboratories.

Table 1. Digital set-top audio decoder nomenclature Set-top box manufacturer Guide application Setup menu item Set-top mode selection Equivalent AC-3 decoder operating mode Scientific-Atlanta SARA6 Audio dynamic range:7 Narrow RF mode Scientific-Atlanta SARA Audio dynamic range: Normal Line mode Scientific-Atlanta SARA Audio dynamic range: Wide Not applicable8 Motorola TV Guide Audio\audio output: TV RF mode Motorola TV Guide Audio\audio output: Stereo Line mode Motorola TV Guide Audio\audio output: Advanced/heavy RF mode Motorola TV Guide Audio\audio output: Advanced/light Line mode Motorola TV Guide Audio\audio output: Advanced/none Line mode with no DRC9

Footnotes

1 The dialogue normalization value in an AC-3 bit stream must be set by the program originator to indicate the dialogue level of the program relative to 0dBFS. The valid range for this value is from -1dBFS to -31dBFS. Also note that the dialnorm value is used by the decoder (within the set-top box) to normalize the programming to a consistent level.

2 The FCC limits are of no significance to RF modulators in set-top boxes, VCRs and PVRs.

3 That is, dialnorm value represents the long-term A-weighted level of spoken dialogue in the program.

4 -23dBFS Leq(A) in each channel of a two-channel decoder.

5 That is, below 100 percent modulation 25kHz peak deviation.

6 Scientific-Atlanta Resident Application.

7 Some systems may choose to remove the ability for the subscriber to access/change decoder operating modes via the DNCS. However, each set-top box should be defaulted to narrow mode.

8 Use of wide mode is being deprecated and must not be used under any circumstance.

9 With the exception of overload protection, dynamic range control metadata (if present within the audio bit stream) is not applied in this mode.

References

“EIA/CEA-CEB-11 NTSC/ATSC Loudness Matching,” Engineering Bulletin
“An Analysis of Audio for Digital Cable Television: Recommendations for the Digital Transition via Metadata,” Jeffrey C. Riedmiller, NCTA Technical Papers, 2001