Digital audio for DTV

There are various technologies available to broadcasters and consumers to augment video and audio programming. Digital technology allows the use of additional
Author:
Updated:
Original:

There are various technologies available to broadcasters and consumers to augment video and audio programming. Digital technology allows the use of additional services and features, but doing so effectively means understanding the underlying requirements and limitations. Among the useful features of ATSC are associated audio services and audio metadata.

Associated audio services

Typically, ATSC broadcasters transmit audio using the complete main audio service (CM), which contains a total audio program with dialog, music and effects. This service can include one to 5.1 channels. One way to support multiple languages is by transmitting multiple CM services, each in a different language. In addition to the main service, the ATSC Standard A/52 includes support for several other specific audio services: music and effects (ME), dialog (D), visually impaired (VI), hearing impaired (HI), commentary (C), emergency (E) and voice-over (VO). While these are mostly self-explanatory, a few usage issues are pertinent.

The ME service is similar to the CM service, but it does not contain the dialog for the program, which can be transmitted separately using one or more D services. In this manner, a more efficient use of bandwidth can be achieved, such as when transmitting multiple languages, each on its own D service. Supporting this type of transmission requires receivers to simultaneously decode one ME service and one associated D service in order to form a complete audio program, i.e., the receiver would require multiple audio decoders. The D service audio is then mixed with the main audio, usually with the center channel, if present. (See Figure 1.)

The VI associated service provides a narrative description of the visual program content, a feature that grew from the need for audible descriptions of program action for visually impaired viewers. Special audio-only receivers could also be designed this way for viewers not needing the picture. Currently, many broadcasters are using the VI audio service for video description, according to the WGBH National Center for Accessible Media (NCAM) in Boston. VI is used throughout the Public Broadcasting System, as well as some of the other networks' stations, providing the video description produced by NCAM to local affiliates.

But there is currently no requirement for the transmission of the VI service. In 1998, NCAM sent a letter to the FCC, recommending that all DTV receivers should be equipped with the appropriate audio decoding and mixing capability (i.e., dual-stream decoding) to access and deliver the VI associated service and that all DTV receivers should be capable of selecting video description when available. In 2000, the FCC adopted rules mandating that a certain amount of programming contain video description. These rules took effect in April 2002, but were struck down by a federal court in November 2002. As a result, video description is not required, and its use is voluntary.

While the FCC requires broadcasters to comply with the ATSC standard, it does not require broadcasters to transmit specific elements of the standard other than PSIP, closed captions, and ratings information and descriptors. In addition, the FCC has stated that digital-to-analog converter boxes are not required to process all associated audio services. (The FCC nonetheless advises consumers to check with manufacturers and retailers to understand whether specific products provide video descriptions.)

The requirement in the FCC's rules that emergency information provided on television should be accessible to individuals who have vision disabilities was updated recently, but the specific means were not addressed. Emergency audio could be supplied using the special E service, ostensibly by automatic receiver switching, but this mechanism is not mandated by the FCC, and neither is the transmission of an E associated service. Instead, broadcasters typically replace regular program audio with an emergency message.

It is not clear exactly how many broadcasters are using associated services. There are also no ATSC receiver mandates for these services, so it is unclear how many receivers provide support. (ATSC A/53 Part 5 informatively asserts that “the capability to simultaneously decode one main service and one associated service may not exist in some receivers.” Readers are encouraged to refer to this standard, as well as ATSC A/52, for further details.) Therefore, neither transmissions nor receivers are currently required to carry or decode the associated audio services.

To get around this ambiguity, most stations will resort to a second CM program mix, labeled as a supplementary audio service, which is allowed by the ATSC standard. Although this is not bandwidth efficient, it does get around the problem of receiver compatibility, however, as multiple decoders can increase receiver costs.

Carrying audio metadata

Another area of growing interest is audio metadata. Although our discussion applies to all metadata in general, let's use one specific element as an example — dialnorm — because it's becoming increasingly used (or misused) in the audio chain. Dialnorm sets a consistent loudness level over time and across multiple services using AC-3 (Dolby Digital) encoding. We'll skip the background (as it's been covered here recently and extensively), but recall that the dialnorm value is used in the decoder to return the program level to a standard dialog loudness level. It's important to realize that not all sources and destinations use AC-3, and the ultimate recipient of the program is a home viewer/listener, who may or may not be receiving an AC-3 encoded program.

Today, a home TV viewer will receive broadcast programming from one of several means: OTA reception or retransmission by means of cable, satellite or telco. For OTA reception, all digital transmission in the United States is by ATSC, which will always have AC-3 audio, and therefore dialnorm will always be present. The critical issue here is whether it has been set properly by the network or local broadcaster.

When a local station receives programming from the network, it is usually by means of an uncompressed or near-lossless compressed feed. It is then up to the local broadcaster to set the correct dialnorm value when ATSC encoding is carried out. However, some local broadcasters take a precompressed ATSC feed from the network (or even one using Dolby E,which likewise carries metadata) and carry out program insertion (bit stream splicing) to insert local programming. In this case, the network-originated programs will already have the dialnorm value set, and this will be passed along to the viewers. However, the local insertion programming must also have the correct dialnorm setting, so the local production workflow should consider this.

For cable, satellite and telco retransmission, a mixed mode operation is always present, as different sources may or may not be available with audio metadata. (See Figure 2.) Sources that already contain metadata can pass it directly to the audio decoder, as all AC-3 decoders will use this data properly. This is the ideal situation, as it will result in a closed system that is not susceptible to mishandling.

However, sources without metadata that will be delivered digitally must be encoded at the plant in either AC-3 or MPEG. The former will require a correct local setting of the dialnorm value. The latter has no equivalent, so the program audio level must be set manually. In this case (as well as that for an analog cable channel), a loudness processor must be used that sets the outgoing program level appropriately with respect to the digital channels, so that the audio decoder will have consistent levels across all programs.

Broadcasters and retransmission operators need to work together to manage loudness across all programs and services, and fully understand and agree on their specific loudness practices. Useful references include the CableLabs document “OpenCable Host Device 2.0 Core Functional Requirements” and the CEA document “EIA/CEA-CEB-11 NTSC/ATSC Loudness Matching.”

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments to: aldo.cugnini@penton.com