Guidelines for Streaming Loudness

As mature as the television industry is, it appears to be going through a very disruptive identity crisis right now. Cable providers are hemorrhaging subscribers while over-the-air broadcasting is being rediscovered; broadcasters and content creators are rolling out their own streaming services to ensure they have direct access to customers without the need for third parties; the journalistic integrity of traditional news organizations is under attack while tech companies deliver uncorroborated information to users while denying any responsibility for determining its credibility; and viewership of traditional sports broadcasts seems to be waning while interest in eSports is on the rise.

Yet despite this changing content landscape, viewing of television content seems more popular as ever, even as the array of devices used to consume it and the possible environments it gets consumed in appears unfathomable.

NEXT GEN TV
The impending rollout of ATSC 3.0 promises to bring together broadcast and streaming content in ways we haven’t experienced up to now, with the ability of immersive and customizable audio to be delivered to practically every device, and to be enjoyed through speakers or headphones. Unfortunately, not all streaming content is quite ready to become a part of the broadcast ecosystem, either due to poor quality or because it doesn’t meet broadcast standards.

One area where this is apparent is with streaming audio loudness, since the disparate streaming services and online outlets use different loudness settings. The good news is that most outlets now implement some sort of loudness management for content; the bad news for content creators is that this is usually done through automated normalization rather than through monitored gain modification, so audio content could be negatively altered.

AGOTTVS Streaming Audio Loudness Recommendations

The first North American attempt at setting loudness standards for streaming content was done by the AES Study Group on Streaming Loudness, headed by recording engineer Bob Katz, who wanted to set loudness standards for music streaming.

In October of 2015, the group released the document, “Recommendations for Loudness of Audio Streaming and Network File Playback,” which recommended that target loudness of non-metadata encoded music reside between –16 and –20 LUFS. While this was a great start, it was immediately realized that all other streaming audio content would benefit from similar loudness recommendations, especially the audio accompanying streaming video.

So, in early 2016 the AES Audio Guidelines for Over the Top Television and Video Streaming (AGOTTVS) Study Group was formed, chaired by Jim Starzynski, the director and principal audio engineer at NBC Universal. His previous foray into loudness management was as part of the ATSC committee that created A/85, which serves as the core of the CALM Act and regulates broadcast audio loudness.

The group is made up of members from broadcast and cable networks, content providers, device manufacturers and content delivery services, as well as other interested parties, all working together to gain agreement on standards across the entire content chain, from creation to delivery to the consumer.

Preliminary AGOTTVS guidelines were published in September 2016, and on October 19, 2017, the AES announced an update to those guidelines that appear to be on their way to becoming an AES recommended practice. AES Technical Document AESTD1006.1.17-10, titled, “Loudness Guidelines for OTT and OVD Content,” can be found at AES.org and it is worth downloading and reading all the way through.

The document’s objectives are spelled out clearly in section 2, namely to “Provide consistent loudness across different Programs, provide appropriate loudness range for devices and listening conditions, prevent … processing from degrading audio quality, preserve the artistic intent, and improve the listening experience.”

Basic recommendations are along the same lines as A/85: use ITU-R BS.1770 measurement tools and make use of metadata with content whenever possible to take advantage of available dynamic range control and loudness management.

LONG FORM VERSUS SHORT FORM
For North America, loudness measurements of long-form content should be done by measuring the anchor element, typically dialog, while full program measurements are recommended for short-form content. For long-form content where the anchor element cannot be determined, it is recommended that the full program be measured.

The guidelines themselves have recommendations for six possible content scenarios: delivery and distribution of properly prepared content; systems with questionable metadata capabilities; devices or environments with limited dynamic range; and for content created with no regard for standards or recommendations. Quite a few of the guidelines mention prior arrangements, which have been made between the content creator and distributor, and in those situations the document recommends that loudness and true peak values should be measured and should not exceed recommended regional targets.

Specific targets can be found in Annex B and vary according to region. Loudness targets for North America are –24 LKFS, plus or minus 2 dB, with maximum peaks no higher than –2 dB TP. Maximum loudness is never to exceed –16 LKFS, plus or minus 1 dB, with peaks measuring no higher than –1 dB TP.

While these recommendations should help moderate loudness of television and video content, the document notes that the loudness of sounds generated in the system itself, such as phone and tablet alerts, may differ from content loudness and requests that developers work to match loudness of all sounds across the device in future versions of software, though emergency alerts are expected to be louder than other content due to their nature.

Also noted is the fact that some devices utilize codecs, which may not support metadata, resulting in undesired playback loudness from the device and suggests that they follow CTA CEB11 and EBU Tech 3344 recommendations to mitigate this.

Finally, the document goes into the differences between fixed and portable devices, their environments, and typical expected capabilities.

This document is a major step forward in sorting out the loudness issues between different streaming services and the many devices on which their content is consumed. While AGOTTVS has a wide cross-section of industry participation, it’s not everyone. Some are waiting on a standard to be created before implementing changes, and while this document will only be a recommended practice, so is A/85.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.