Like video, television sound has dramatically changed since network TV’s early days when 7.5kHz monaural landline telco television network audio bandwidth was the prime-time norm. State-of-the-art early quad machine audio such as the Ampex VR-1000 was 50Hz to 10Khz at 50dB s/n with 3-percent RMS distortion. Later, the most advanced high-band Quad machines achieved 50Hz to 15KHz monaural audio at 55dB s/n. Satellites brought 15KHz national network audio to stations in the 1970s.
A rare few music programs were simulcast in stereo on local FM stations until multichannel television sound (MTS), the U.S. standard for stereo broadcast, was authorized by the FCC in 1984. By 1985, approximately 70 TV stations in the U.S. were broadcasting MTS. Not all TV stations broadcast in stereo, so a disclaimer graphic reading “In Stereo where available” appeared at the beginning of most network programs for several following years. By the 1990s, analog television audio had reached its plateau until DTV opened new opportunities for more dramatic sound improvements.
Today’s TV viewers are listening more closely than ever before. Viewers/listeners fall into one of three categories. One is the audiophile, who may be listening on an expensive Dolby 5.1 or 7.1 THX or similarly approved high-powered home theater audio system. The second is the viewer listening with a less expensive home theater system with external amps and speakers, in stereo, 5.1 or 7.1. The third is the average viewer listening with the standard speakers and amplifier built into their TV display. Most modern off-the-shelf big screen TVs incorporate excellent built-in sound systems.
Lo-Fi pictures and audio
Common in last century’s NTSC CRT TV sets was a 2in or 3in speaker driven by a small audio amp circuit at about 3-percent distortion. These low-fidelity, limited dynamic range sound systems required a completely different sound mix than today’s DTV displays and home theater systems. Whispering scenes in soap operas, interrupted by some of the industy’s loudest and most compressed commercials, exasperated the situation and irritated viewers.
Does anyone remember the once-ubiquitous Auratone 5C studio monitor speaker? It was once the industry-standard loudspeaker used to mix for the average pre-21stcentury CRT TV set. It was a single 5in speaker in a small cube enclosure specifically designed to reproduce mid-range only. It provided a reference that reasonably matched the Lo-Fi sound the vast majority of viewers were listening to.
Hi-Fi pictures and audio
Most modern flat-screen TVs include superior speakers and usually have significantly higher power amplifiers. They often meet the specs of high-end component sound systems. Distortion is low, frequency response and dynamic range are wide, and some include sub-woofers. The new normal is an audience that expects a broader movie theater-like sound with their TV viewing experience.
The good news about DTV audio is that its better sound specs and wider dynamic range can significantly enhance the perceived drama of program material. The bad news is that some commercials, particularly those with highly compressed audio, sound louder. Although the CALM Act is only six months old, much of the law regarding loudness was implemented 17 years ago when the FCC adopted the ATSC standard, including A/52 and A/53. A/52 describes the audio codec, and A/53 describes the DTV format. Both identify the golden rule of loudness, which is that the loudness of normal spoken dialog and the metadata parameter called dialnorm shall match. If all content obeys this rule, the loudness will be consistent.
Problems and solutions
Broadcasters and multichannel video programming distributors (MVPDs) are taking several approaches to CALM Act compliance. One of the hoped for side-benefits of the CALM Act was to eliminate AGCs and compressors, but reducing dynamic range is not specifically forbidden. Some local stations squash audio dynamic range to keep the program audio as loud as the commercials, in an attempt to avoid viewer complaints. Most MVPDs, particularly broadcast and cable networks, monitor dialnorm and LFKS levels (Loudness, K-weighted, relative to Full Scale) in both real-time and reviews. The trick is to simultaneously satisfy the creative community that produced the content, along with audiophile and average viewers.
At the moment, evidence seems to indicate that the FCC is not actively seeking out stations for CALM Act non-compliance, and complaints from viewers have reportedly decreased. More recent complaints about broadcast audio are coming from producers and directors of programs and commercials, who are rightfully concerned about their audio creativity being stifled by over-processing.
The creative community knows that nearly all viewers are capable of receiving a greatly improved audio experience compared to the days of NTSC CRTs, and many are mixing the sound accordingly. Their concern is what happens to the content audio that has been processed multiple times by multiple methods. It’s not unusual for a long-form program to be processed by a file-based procedure, or a number of file-based procedures as it is handed off from server to server. Nor is it unusual for it to be processed again as the program plays out in real time, and perhaps again at the STL, transmitter or head-end.
We’re not all experts
In this era where nearly every product that handles audio on the production and transmission end seems to have some sort of built-in loudness control system, questions arise as to who knows best how to handle loudness. The best-case scenario would be to include silent data hidden in the audio stream that tells audio processing and file inspection systems whether to process the audio heavily, lightly or not at all, and with the ability to reverse or undo all previous processing. Program and commercial material would be processed, or not, based on the level of trust of its source. The problem is how to pass that information along the often serpentine route that audio and audio files travel.
One method is employ what one manufacturer refers to as “Intelligent Dynamics.” Using this method, the audio itself is encoded with compression instructions processors can read. By design and most importantly, it is not an audio watermark. Instead, it is a codec-independent file-based system that takes advantage of unused metadata blocks in files to carry the instruction data. It works similarly in file-based and pulse code modulated (PCM) systems. The advantages of Intelligent Dynamics are that it is reversible, and it allows trusted sources to bypass further processing.
Demand from the creative community to preserve the aesthetic qualities of audio has motivated development of systems with similar goals by several manufacturers. The results are that the audio experience continues improving, with benefits to please everyone.