Loudness monitoring

Loudness has been a hot topic lately. Loudness issues have been around since the beginning of broadcasting or at least broadcast advertising, but they were exacerbated by the DTV transition and inconsistent use of the dialnorm metadata. Listener dissatisfaction increased, ultimately culminating in passage of the CALM Act.

The BS.1770 loudness measurement standard

In the early years of DTV, these problems prompted development of loudness measurement technologies, resulting in the ITU standard BS.1770. This standard describes a fundamental loudness measurement algorithm. This was validated through multiple sets of listening tests on various pieces of program material. It also describes a true-peak meter for determining the peak amplitude expected when a digital audio signal is reproduced in the analog domain or transcoded into another digital format. The ITU BS.1770 standard is referenced in the ATSC recommended practice A/85, which describes how broadcasters should ensure a satisfactory listening experience for DTV viewers. The ATSC document also states, “Users of this RP should apply the current version of ITU-R BS.1770.” We will discuss the importance of this below.

When the CALM Act was originally introduced, it mandated uniformity of commercial and program loudness. It did not include measurement methods and was so vague that it would have been a nightmare for both broadcasters and the FCC. Fortunately, industry representatives convinced Congress that adoption of the recently developed A85 recommended practice would resolve the situation. The legislation was rewritten to mandate that the FCC enforce ATSC A/85, and its successors, through appropriate rule making.

The CALM Act created an obvious opportunity for equipment manufacturers to provide loudness measurement tools. Although a few loudness measurement products were on the market by 2009, many more were introduced at NAB and IBC last year, and there are currently more than a dozen loudness measurement products of various forms on the market. Each is vying for a piece of the large short-term market created as broadcasters equip their facilities for loudness.

At the end of October 2010, the ITU committee, which maintains BS.1770, accepted (after much negotiation) changes submitted by the EBU. The result is a significant improvement in the calculation of loudness, which makes the measurement much more sensitive to the loud portions of an audio segment. The effect is to prevent advertisers from significantly increasing the loudness of a portion of a commercial by drastically reducing the loudness elsewhere.

Consider a hypothetical example where an announcer screams at widely spaced intervals throughout a commercial in an effort to get the viewers' attention. The new method assesses this spot as louder than the original technique. It also removes the explicit requirement to measure the loudness of dialog and instead bases the assessment on all audio content (except for the LFE). This prevents unscrupulous advertisers from circumventing loudness limits by blasting a viewer with non-vocal content while removing the need for proprietary algorithms that only measured loudness when dialog was present.

This new version of BS.1770 will probably be published early this year. Recall that ATSC A/85 already specifies that updates to BS.1770 automatically apply. Consequently, meters in use will need to conform to the revised specification. Most manufacturers of such products have been following the developments in the EBU and ITU and have upgraded their software to accommodate the change. Unfortunately, they haven't necessarily done it correctly.

As a user, how do you assess whether a meter you are considering meets the revised specification? It's not as easy as testing a VU meter or a PPM. The EBU specifies some basic tests in its technical recommendations and provides the necessary waveforms on its website. However, these tests are basic and do not thoroughly test all aspects of compliance.

We will describe a suite of tests developed specifically to check every aspect of a meter's design. These tests also give diagnostic information about any implementation issues that exist. They are available at no charge as described in the “Loudness meter evaluation” sidebar on page 18.

Our test suite was developed by crafting signals whose parameters change dynamically so as to stress individual portions of the measurement in isolation. Each test can then maximize its sensitivity to the specific implementation errors it was designed to detect. The signals were passed through mathematical models of the algorithm and through models with intentional implementation errors. The signals were optimized to give the largest difference between readings obtained by the correct model and those obtained by incorrect implementations.

The new BS.1770 algorithm operates on multiples of a basic 100ms interval, so readings differ slightly with variations between the start of the measurement and the start of the signal. These reading differences follow a cyclic pattern, with alignments 50ms apart creating maximal difference. Consequently, the test signals were evaluated at a reference alignment and at an alignment 50ms delayed, and signal characteristics adjusted to minimize this difference — though sometimes this was in direct conflict with the desire to maximize the sensitivity to implementation errors.

Revising the ITU standard

The original ITU loudness measurement algorithm is shown in Figure 1 on page 14. The audio channels (except the LFE) are independently filtered with a low frequency roll-off to simulate the sensitivity of the human ear and a high frequency shelf to simulate head diffraction effects. The combined response of these filters is referred to as “K weighting” and is illustrated in Figure 2.

Surround channels are given a 1.5dB boost to account for the relative gain provided by their position on each side of the listener. The power in each channel is summed to obtain the power in the entire signal. This power is averaged over the entire program to obtain a single number metric for the program loudness. If a “dynamic” indication of loudness is desired, a three-second moving average is typically used. Readings are reported in LKFS (Loudness, K-weighted, relative to Full Scale), which may be thought of as loudness dBFS.

The ATSC recommendation specifies that loudness measurements should focus on dialog or an alternate anchor element. The intent was that viewers would set the dialog loud enough to be intelligible in their environment, and that maintaining constant dialog loudness would maintain intelligibility. This assumed well behaved content (many commercials don't fit this description), and also depended on proprietary loudness measurement technology. In an effort to address these and other issues, the EBU PLOUD committee revisited BS.1770. Their work resulted in the 2011 revision of BS.1770.

This revision maintains the same filtering and power measurement method used in the original standard, but changes the way measurements are averaged and presented. The integrator stage of Figure 1 is replaced with the processing shown in Figure 3. The channel power is summed over 400ms intervals. These intervals overlap by 75 percent, so a new value is obtained every 100ms. Results are gated with a Start/Stop control to allow selection of the audio segment to be measured. An absolute gate of -70 LKFS is applied, which automatically eliminates lead-in and playout portions of isolated audio segments.

The algorithm focuses on the foreground portion of the audio by a two-step averaging procedure (the yellow elements in Figure 3). The 400ms measurement values are averaged over the content being measured; the resulting LKFS value is decreased by 10 and used to gate the 400ms measurement values. This relative gate focuses the assessment on foreground sounds, the elements that generally dominate viewers' judgments of program loudness. The values that pass the relative gate are averaged to form the final reading, called “Integrated” loudness (abbreviated “I”).

Both the ITU and ATSC documents specify a true-peak meter. This is a device that measures the peak value a digital audio waveform will reach when it is reproduced in the analog domain or when it encounters many forms of digital processing. To understand the problem, recall that digital audio represents a continuous analog signal by a series of samples, taken at regular intervals determined by the sample rate. As Figure 4 illustrates, there is no guarantee that samples will land on the audio waveform peak.

However, these samples do represent the underlying audio waveform, and when it is reconstructed, the peak will be restored. This peak can also occur when the samples are subjected to many types of processing — anything that introduces phase shift or time offset — such as sample rate conversion, filtering or delay. If this happens in the digital domain, the new samples may clip, even if the original samples did not reach digital full scale. Because many peak meters merely display the maximum audio sample, they incorrectly gauge the system headroom.

The EBU recommendation introduces other measures that are still under consideration by the ITU. Intended to assist mixers and program personnel in creating and characterizing content, their acceptance by the ITU is unlikely to impact CALM Act requirements. However, given their potential usefulness in production, it is helpful to understand them.

The EBU specifies Momentary loudness (M) as the stream of 400ms measurements, which drive the gating mechanisms described earlier. When displayed on a meter, they look much like a VU display because the 400ms averaging time is close to the 300ms of a classic VU meter. Watching this display during a mix helps a mix engineer estimate the program loudness of live productions. The “Maximum Momentary” loudness is specified by the EBU as the largest 400ms measurement during the measurement time. The listeners' perception of loudness is best described with a longer averaging time. The EBU recommends a running three-second average, which it calls Short-Term loudness (S).

The EBU also define a measurement called Loudness Range (LRA). This is derived from the Short-Term loudness using a relative gating process similar to that described above but with the gate set at -20. The LRA is the span from the 10 percent to 95 percent points on the distribution of Short-Term loudness values that pass the relative gate. The LRA is descriptive of the program material dynamic range. Using the 95-percent point allows occasional extremely loud events, while the 10-percent point ignores modest silent intervals during the program. If the LRA exceeds about 15, it is likely that viewers will be unable to find a single volume control setting appropriate for the entire program.

Richard Cabot is the CTO of Qualis Audio. He was chairman of the AES digital audio measurement committee for the development of the AES-17 standard. Ian Dennis is technical director of Prism Sound and as vice-chair of the AES digital audio measurements committee, he wrote the document that became the true-peak meter specification in BS.1770.

Loudness meter evaluation

The test suite described here is available for testing BS.1770 compliance of any loudness meter. It may be downloaded as a dScope III script and supporting files from www.prismsound.com, and also as a series of .WAV files and documentation from www.qualisaudio.com. More complete documentation of the tests, their design and their expected results are included in the download package. Any new tests will be added as they are developed.

The menu for the script-based implementation is shown in Figure 1. The full suite currently consists of 12 tests. All tests except Test 2 are stereo signals and should be applied to the LF and RF channels of a surround loudness meter. All tests except the first three comprise a 1kHz sine wave at varying amplitudes. When the expected result is a range rather than a specific target, this is due to the 100ms alignment uncertainty.

Test 1 checks the accuracy of the True-Peak meter. The initial waveform is a 1/8 sample rate, -6dBFS sine wave whose samples are chosen to correspond to the sine wave peaks. After three seconds, the frequency changes for one cycle to one-quarter sample rate, and the amplitude increases to -2dBFS. The samples are chosen to occur 45 degrees off the sine wave peaks, as shown in Figure 2a. When this waveform is properly interpolated, as would occur when it is reproduced in the analog domain, the waveform of Figure 2b results. All meters will read -6dBFS initially. After three seconds, the reading should increase to -2dBFS. Noninterpolating meters will increase to -5dBFS.

Test 2 is a variation on EBU Tech 3341 Test Case 6. This signal is Dolby Digital encoded and stimulates all channels simultaneously, including the LFE. Power summation is checked by using sine waves of slightly different frequencies. A compliant meter reads -23LKFS. If the meter incorrectly sums in the time-domain, the reading will cycle. If the meter includes the LFE, the reading will be too high.

Test 3 checks the filter response at six frequencies: 25, 100, 500, 1k, 2k and 10kHz using sine waves of varying amplitudes to give a constant reading of -23LKFS.

Test 4 alternates between -69.5 and -120dBFS to exercise the absolute gating function. A compliant meter reads -69.5LKFS. A meter that does not implement absolute gating reads -71.5 to -72.7.

Test 5 checks the amplitude between -23dBFS and -6dBFS at intervals between 0.5 seconds and 1.4 seconds. A meter that correctly implements relative gating reads -7.7LKFS to -8.2LKFS; a non-compliant meter reads between -13.2LKFS and -13.5LKFS.

Test 6 checks aspects of relative gating missed in Test 5 by alternating between -36dBFS and -20dBFS. A compliant meter reads between -22LKFS and -22.5LKFS, whereas noncompliant meters read between -24.3LKFS and -24.7LKFS.

Tests 7, 8 and 9 are implementations of EBU Tech 3342 Test Cases 1, 2 and 3, which evaluate basic loudness range meter operation. Each test takes 40 seconds and spends half its time at each of two amplitudes — 10dB, 5dB and 20dB apart, respectively. The loudness range meter should read these same values.

Test 10 implements EBU Tech 3342 Test Case 4, which evaluates loudness range meter gating. It spends 20 seconds at each of the amplitudes -50dBFS, -35dBFS, -20dBFS, -35dBFS and -50dBFS. A compliant meter will ignore the -50 amplitudes and read 15LU. If gating is not implemented, the meter reads 30LU.

Tests 11 and 12 are sine wave-based alternatives to EBU Tech 3342 Test Cases 5 and 6, which are narrow-loudness-range (NLR) and wide-loudness-range (WLR) program clips. The NLR test uses amplitudes of -50dBFS, -40dBFS, -25dBFS, -20dBFS, -15dBFS, -20dBFS, -25dBFS, -40dBFS and -50dBFS, while the WLR test replaces the -25 amplitudes with -35dBFS. In both tests, the -40dBFS amplitudes are maintained for three seconds and the -15dBFS amplitudes for two seconds, while the other amplitudes have 23-second durations. These short durations at -40 and -15 test the 10-percent and 95-percent statistical processing defined in the loudness range algorithm.

If a loudness meter gives the expected results for each of the tests above, there is a high likelihood that the implementation is compliant with the latest version of BS.1770.