Processing audio for mobile DTV

Just when audio mixing engineers were finally getting comfortable with 5.1 surround sound and other types of multichannel audio formats for sophisticated home and public theater use, along comes the need to correctly process stereo audio signals for portable devices with 1in speakers.

The goal for content distributors is to create a consistent and predictable audio experience without having to preprocess everything upstream of the device. Sending out the proper tags and metadata, the encoder will then instantly know it has to make sure the content is rendered (decoded) properly for the mobile device.

In most cases, the dynamic range needs to be more tightly controlled due to physical and electrical limitations of cell phones, tablet PCs and other devices. Couple these limitations with the fact that these devices are normally used in very noisy environments (trains and buses), and program audio can become unreliable and frustrating for users if not mixed right.

Dolby Labs has done some testing of portable mobile DTV reception devices, as part of its ongoing research into this area. The company's Dolby Mobile product (a suite of post processing technologies for mobile devices) includes a series of custom compression algorithms that are used to improve the audio experience on mobile devices. Some of these algorithms can take a surround-sound mix and “virtualize” the 5.1 signals for rendering over ear buds, headphones and even a larger device's compact internal speakers.

One of the things Dolby's engineers learned during their tests is that TV-based content (movies, sitcoms, etc.) are mixed with different production philosophies, in terms of peak levels and dynamic range, than MP3 music files. Modern music is typically produced with peak levels frequently at or just below clipping and the loudness averaging approximately 11dB below that. Therefore, it doesn't have a lot of dynamic range, whereas movies and TV content offer a variety of levels relating to dialogue, sound effects and ambient noise. In this case, the typical average loudness for this type of content is in the -22 to -29 below clipping — 12dB to 18dB lower. (Remember, a 10dB difference in loudness is approximately equivalent to a subjective doubling, or halving, in perceived loudness.)

Dolby's research in this area sought to establish a new reference level for what sounded good on small speakers, but also looked to leverage metadata computed at the encoder upstream (either in Hollywood or at a broadcast facility). The decoder at the device could then identify the incoming bit stream and activate different modes to get the best results.

The company has taken the two main legacy decoder operating modes, Line and RF (used for DVD and TV content), and developed a new reference level for portable devices that are based on the same principals. It's called Portable Mode and is present in the latest generation of Dolby decoders and processing tools for mobile devices.

Jeffrey Riedmiller, Director, Sound Platform Group, at Dolby Labs, worked with a few of his colleagues last summer to develop a comprehensive white paper that was presented at the Audio Engineering Society conference in London in May 2010. It describes a nondestructive method for controlling playback loudness and dynamic range on portable devices and is based on a worldwide standard for loudness measurement as defined by the ITU.

They worked with a database of 25,000 songs, along with hundreds of TV and movie titles, and found that when looking at music, the mean or medium of that large data set ended up being about 11dB below maximum level. So, Dolby decided to establish -11dB as the new audio reference point for its decoders embedded within portable devices. Using this level, portable device manufacturers could leverage the metadata contained in the latest Dolby formats to non-destructively process the wide dynamic range content, such as TV shows and movies, in a manner optimized for these device types all while providing a better subjective match (in terms of loudness and dynamics) to music files being played. The result gives the consumer a much improved (and more consistent) experience in terms of loudness, intelligibility and peak level control.

With this metric as a baseline, most encoders can be used to control the level and dynamics of content processed with Dolby's technology and metadata generated with other companies' technology as well. This new metric is also being deployed inside Dolby's Mobile platform and its suite of core technologies that are used for Dolby Pulse and Dolby Digital Plus decoder products.

Riedmiller said this new Portable Mode also makes things easier for mobile video producers to create content that complies with new government mandates to protect against audio loudness. It does this while also bringing the dialogue out in front to make it more intelligible (and enjoyable) when viewing in noisy environments. (This is especially import for live sporting events, where the crowd can drown out the announcer if the mix is not done right.)

Meanwhile, the associated metadata is still attached to the media so that if a person plugs his or her iPhone into a docking station, it can play out that music or TV show on a large home theater speaker system and sound great, with the originally mixed full dynamic range.

However, there are a number of challenges with processing audio signals for mobile devices that engineers should be aware of, according to Tim Carroll, founder of Linear Acoustic. He said that audio programs may vary from mono to 5.1 channels, but the current mobile audio pipeline is stereo (and mono in many cases). This requires compatible and automatic downmixing that is stereo- and mono- compatible.

Linear Acoustic offers its product, which handles up to two separate mobile audio streams. It combines downmixing with processing and psychoacoustic tools to produce a controlled and intelligible result for mobile devices. The mobile processing will soon be available in the company's AERO.file file-based product as well.

The Dynamic range processing has to be tailored not only for controlling range, but also for protecting dialogue, and this requires managing a different set of pyschoacoustic challenges. Carroll said that multiband processing, coupled with these new psychoacoustic tools, could work extremely well. Attention must also be paid to the codec itself as different versions and mixes of encoders and decoders can produce surprises.

Peter Poers, Director Sales & Marketing, at Jünger Audio-Studiotechnik, said broadcasters have to understand the limitations of their one-fourth rate (or less) transmission channel and mix the content knowing it will be heard on tiny earplugs or little speakers. Reducing dynamic range is a way to ensure comfortable listening.

Jünger makes a Television Audio Processor that combines audio conditioning and encoding in the same unit. It also performs dynamic range management, loudness and peak level control, consistent spectral balance (by Spectral Signature), and metadata control for the transport stream encoder.

Broadcasters also have to recognize the limitations of their legacy encoders. Poers said if a station arranges and mixes for a “consistent spectral energy balance,” the encoding results would improve by estimating low transmission bit rates in the system.

Therefore, broadcasters launching new Mobile DTV services should talk with equipment vendors to ensure that they support new decoder operating modes, like Dolby Portable, and also the generation of metadata in their systems. The encoder also has to generate the proper metadata.

This should not be a problem as the original MPEG and DVB specs have had bit fields within it that carry audio metadata across decoders. That said, many manufactures have not implemented it to date.

Standards are always key to success in broadcasting, and right now consistent use of a way to process audio for mobile devices is still evolving. Yet, progress is being made in several places (including within the Advanced Television Systems Committee) as new ATSC A/153-compliant encoders and decoders become available to help ensure content sounds good and levels meet with consumer needs. A satisfying user experience is perhaps the most important element to a successful mobile TV service. Being able to hear it correctly is a big part of that.

Michael Grotticelli regularly reports on the professional video and broadcast technology indstries.