New loudness metering standard

Loudness issues have been around since the beginning of broadcasting, or at least broadcast advertising, but they were exacerbated by the DTV transition and inconsistent use of the dialnorm metadata. Listener dissatisfaction increased, ultimately culminating in passage of the CALM Act.

In the design stages of DTV, considerable effort was put into development of loudness measurement technologies to support the dialnorm mechanism. This work culminated in the original version of the International Telecommunications Union (ITU) standard BS.1770, which describes a fundamental loudness measurement algorithm. It was validated through multiple sets of listening tests on various pieces of program material. The ITU standard also describes a true-peak meter for determining the peak amplitude expected when a digital audio signal is reproduced in the analog domain or transcoded into another digital format.

Several years ago, the ATSC developed recommended practice A/85, which describes how broadcasters should ensure a satisfactory listening experience for DTV viewers. The CALM Act mandates that the FCC enforce the loudness related portions of ATSC A/85, and its successors, through appropriate rule making. The loudness measurement portion of A/85 is based on BS.1770.

The CALM Act created an obvious opportunity for equipment manufacturers to provide loudness measurement tools. Although a few loudness measurement products were on the market by 2009, many more were introduced at NAB and IBC in 2010, and there are currently more than a dozen loudness measurement products of various forms on the market. Each is vying for a piece of the large market created as broadcasters equip their facilities for loudness measurement.

In late 2010, the ITU committee that maintains BS.1770 accepted (after much negotiation) changes submitted by the European Broadcasting Union (EBU). The result is a significant improvement in the calculation of loudness — one that makes the measurement much more sensitive to the loud portions of an audio segment. The effect is to prevent advertisers from significantly increasing the loudness of a portion of a commercial by manipulating the loudness elsewhere. Consider a hypothetical example where an announcer screams at widely spaced intervals throughout a commercial in an effort to get the viewers' attention. Though the average loudness might seem reasonable, the peak loudness will be quite annoying. The new method puts more emphasis on the louder portions and assesses this spot as louder than the original technique would have.

This new version of BS.1770 has just been published. Consequently, meters in use will need to conform to the revised specification. Most manufacturers of such products have been following the developments in the EBU and ITU and have upgraded their software to accommodate the change. Unfortunately, however, they haven't necessarily done it correctly.

The problem

As a user, how do you assess whether a meter you are considering meets the revised specification? How do you know if its software has been modified to conform to the new version standard? It's not as easy as testing a VU meter or a PPM. The EBU specifies some basic tests in its technical recommendations and provides the necessary waveforms on its website. However, these tests are basic and do not thoroughly test all aspects of compliance.

We will describe a suite of tests developed specifically to check every aspect of a meter's design. These tests also give diagnostic information about any implementation issues that exist. They are available at no charge as described at the end of this article.

Our test suite was developed by crafting signals whose parameters change dynamically so as to stress individual portions of the measurement in isolation. Each test can then maximize its sensitivity to the specific implementation errors it was designed to detect. The signals were developed using mathematical models of the algorithm, including models with various intentional implementation errors. The signals were optimized to give the largest difference between readings obtained by the correct model and those obtained by incorrect implementations.

The new BS.1770 algorithm operates on multiples of a basic 100ms interval, so readings differ slightly with variations between the start of the measurement and the start of the signal. These reading differences follow a cyclic pattern, with alignments 50ms apart creating maximal difference. Consequently, the test signals were evaluated at a reference alignment and at an alignment 50ms delayed. Signal characteristics were adjusted to minimize this difference, though sometimes this was in direct conflict with the desire to maximize the sensitivity to implementation errors.

Understanding the ITU standard

The original ITU loudness measurement algorithm is shown in Figure 1. The audio channels (except the LFE) are independently filtered with a low-frequency roll-off to simulate the sensitivity of the human ear and a high-frequency shelf to simulate head diffraction effects. The combined response of these filters is referred to as “K weighting” and is illustrated in Figure 2. Surround channels are given a 1.5dB boost to account for the relative gain provided by their position on each side of the listener. The individual channel powers are summed to obtain the surround program's total power. This is averaged over the entire program, yielding a single number metric for the program loudness. If a “dynamic” indication of loudness is desired, a three-second moving average is typically used. Readings are reported in LKFS (Loudness, K-weighted, relative to Full Scale) which may be thought of as loudness dBFS.

The ATSC recommendation specifies that loudness measurements should focus on dialog or an alternate anchor element. The intent was that viewers would set the dialog loud enough to be intelligible in their environment, and that maintaining constant dialog loudness would maintain intelligibility. This assumed “well behaved” content (many commercials don't fit this description), and also depended on proprietary loudness measurement technology. In an effort to address these and other issues, the EBU PLOUD committee revisited BS.1770. Their work resulted in the 2011 revision of BS.1770.

This revision maintains the same filtering and power measurement method used in the original standard, but changes the way measurements are averaged and presented. The integrator stage of Figure 1 is replaced with the processing shown in Figure 3. The channel power is summed over 400ms intervals. These intervals overlap by 75 percent, so a new value is obtained every 100ms. Results are gated with a start/stop control to allow selection of the audio segment to be measured. An absolute gate of -70LKFS is applied, which automatically eliminates lead-in and playout portions of isolated audio segments.

The algorithm focuses on the foreground portion of the audio by a two-step averaging procedure (the orange elements in Figure 3). The 400ms measurement values are averaged over the content being measured. The resulting LKFS value is decreased by 10 and used to gate the 400ms measurement values. This “relative gate” focuses the assessment on foreground sounds, the elements that generally dominate viewers' judgments of program loudness. The values that pass the relative gate are averaged to form the final reading called “Integrated” loudness (abbreviated “I”).

The standards also specify a true-peak meter. This is a device that measures the peak value a digital audio waveform will reach when it is reproduced in the analog domain, or when it encounters many forms of digital processing. To understand the problem, recall that digital audio represents a continuous analog signal by a series of samples, taken at regular intervals determined by the sample rate. As Figure 4 on page 56 illustrates, there is no guarantee that samples will land on the audio waveform peak.

However, these samples do represent the underlying audio waveform, and when it is reconstructed, the peak will be restored. This peak can also occur when the samples are subjected to many types of processing — anything that introduces phase shift or time offset — such as sample rate conversion, filtering or delay. If this happens in the digital domain, the new samples may clip, even if the original samples did not reach digital full scale. Because many peak meters merely display the maximum audio sample, they incorrectly gauge the system headroom.

The EBU recommendation introduces other measures that are still under consideration by the ITU. Intended to assist mixers and program personnel in creating and characterizing content, their acceptance by the ITU is unlikely to impact CALM Act requirements. However, given their potential usefulness in production, it is helpful to understand them.

The EBU specifies “Momentary” loudness (abbreviated “M”) as the stream of 400ms measurements that drive the gating mechanisms described earlier. When displayed on a meter, they look much like a VU display since the 400ms averaging time is close to the 300ms of a classic VU meter. Mix engineers watch the display for help estimating the program loudness of live productions. Listeners' perception of loudness is best described with a longer averaging time. The EBU recommends a running three-second average that it calls the “Short-Term” loudness (abbreviated “S”).

The EBU also defines a measurement called “Loudness Range” (abbreviated LRA). This is derived from the Short-Term loudness using a relative gating process similar to that described above but with the gate set at -20. The LRA is the span from the 10 percent to 95 percent points on the distribution of Short-Term loudness values that pass the relative gate. The LRA is descriptive of the program material dynamic range. Using the 95 percent point allows occasional extremely loud events, while the 10 percent point ignores modest silent intervals during the program. If the LRA exceeds about 15, it is likely that viewers will be unable to find a single volume control setting appropriate for the entire program.

Loudness meter evaluation

The test suite described here is available for testing BS.1770 compliance of any loudness meter. It may be downloaded as a dScope III script or as .wav files from www.prismsound.com/loudness1770, and also as a series of wave files and documentation from www.qualisaudio.com. More complete documentation of the tests, their design and their expected results are included in the download package. Any new tests will be added as they are developed.

The current 16-test menu for the script-based implementation is shown in Figure 5. All tests, except Test 2, are stereo signals and should be applied to the LF and RF channels of a surround loudness meter. All tests, except the first three, comprise a 1kHz sinewave at varying amplitudes. When the expected result is a range rather than a specific target, this is due to the 100ms alignment uncertainty.

Test 1 checks the accuracy of the True-Peak meter. The initial waveform is a one-eighth sample rate, -6dBFS sinewave in which samples are chosen to correspond to the sinewave peaks. After three seconds, the frequency changes for one cycle to one-fourth sample rate, and the amplitude increases to -2dBFS. The samples are chosen to occur 45 degrees off the sinewave peaks as shown in Figure 6a. When this waveform is properly interpolated (as would occur when reproduced in the analog domain), the result is the waveform shown in Figure 6b. All meters will read -6dBFS initially. After three seconds, the reading should increase to -2dBFS. Non-interpolating meters will increase to -5dBFS.

Test 2 is Dolby Digital-encoded and stimulates all channels simultaneously, including the LFE. Power summation is checked by using sinewaves of slightly different frequencies. A compliant meter reads -23LKFS. If the meter incorrectly sums in the time-domain, the reading will cycle. If the meter includes the LFE, the reading will be too high.

Test 3 checks the filter response at six frequencies: 25, 100, 500, 1k, 2k and 10kHz using sinewaves of varying amplitudes to give a constant reading of -23LKFS.

Test 4 alternates between -69.5dBFS and -90dBFS to exercise the absolute gating function. A compliant meter reads -69.5LKFS. A meter that does not implement absolute gating reads -71.5LKFS to -72.7LKFS.

Test 5 steps the amplitude between -23 and -6dBFS at intervals between 0.5 and 1.4 seconds. A meter that correctly implements relative gating reads -7.7LKFS to -8.2LKFS; a non-compliant meter reads between -13.2 and -13.5.

Test 6 checks aspects of relative gating missed in test 5 by alternating between -36dBFS and -20dBFS. A compliant meter reads between -22LKFS and -22.5LKFS, whereas non-compliant meters read between -24.3LKFS and -24.7LKFS.

Tests 7, 8 and 9 are corrected implementations of EBU Tech 3341 Test Cases 3, 4 and 5 that evaluate basic loudness meter operation. The versions created by the EBU have slight errors discovered in the test suite's development.

Tests 10, 11 and 12 are implementations of EBU Tech 3342 Test Cases 1, 2 and 3, which evaluate basic loudness range meter operation. Each test takes 40 seconds and spends half its time at each of two amplitudes 10dB, 5dB and 20dB apart, respectively. The loudness range meter should read these same values.

Test 13 implements EBU Tech 3342 Test Case 4, which evaluates loudness range meter gating. It spends 20 seconds at each of the amplitudes -50dBFS, -35dBFS, -20dBFS, -35dBFS and -50dBFS. A compliant meter will ignore the -50 amplitudes and read 15LU. If gating is not implemented, the meter reads 30LU.

Tests 14 and 15 are sinewave-based alternatives to EBU Tech 3342 Test Cases 5 and 6, which are narrow-loudness-range (NLR) and wide-loudness-range (WLR) program clips. The NLR test uses amplitudes of -50, -40, -25, -20dBFS, -15dBFS, -20dBFS, -25dBFS, -40dBFS and -50dBFS, while the WLR test replaces the -25 amplitudes with -35dBFS. In both tests, the -40dBFS amplitudes are maintained for three seconds and the -15dBFS amplitudes for two seconds, while the others have 23 one-second durations. The durations at -40 and -15 test the 10 percent and 95 percent statistical processing defined in the loudness range algorithm.

Test 16 measures the meter's relative gate threshold, displaying -8 or -10 as appropriate. This is useful to determine if your meter is designed around the old standard.

If a loudness meter gives the expected results for each of the tests above, the likelihood is high that the implementation is compliant with the latest version of BS.1770.

Richard Cabot is the CTO of Qualis Audio. He was previously chairman of the AES digital audio measurement committee for the development of the AES-17 standard. Ian Dennis is technical director of Prism Sound and, as vice chair of the AES digital audio measurements committee, wrote the document which became the true-peak meter specification in BS.1770.