Audio for HDTV: Then and Now

Way back in 1990, the International Telecommu-nications Union's CCIR, or International Radio Consultative Committee, got a jump on audio for HDTV. The ITU Radiocommunications Assembly (ITU-R), as the CCIR is known today, an agency of the United Nations, is charged with studying and making recommendations on various questions and issues relating to radio communications. (This prompts one to ask why they are studying and making recommendations on such issues as digital cinema, but we will not go there today.) The ITU-R recommendations, while not mandatory, are usually observed by the technical and operating services of the government administrations and private operating agencies of the countries belonging to the Union.

ITU RECOMMENDS

CCIR Study Groups 10 and 11 (since combined and renamed Study Group 6) were at the time charged with studying television audio and video questions respectively, and Study Group 10 formed Task Group 10/1, which was assigned the study of "Multi-channel Sound Systems Especially Suited to Accompany High Definition and Enhanced Television Systems." Study Group 10/1 held a three-day meeting in December 1990, which generated the first draft of a recommendation. That draft ultimately became ITU-R BS.775-1 (1992-1994). This writer participated in the 1990 meetings, but was not involved in the subsequent work. Somewhat later, SMPTE issued a Recommended Practice, RP 173 (current status uncertain), which addressed HDTV monitoring speaker placements. The striking similarities between Rec. 775 and RP 173 are not entirely coincidental.

Some may remember that in late 1990, advanced television in the U.S. was in the early developmental stages, and in considerable flux. There had been a number of analog HDTV broadcasting systems proposed, and General Instrument (GI) had just rocked the advanced television world by announcing that they could do it digitally. Audio systems for advanced television were under discussion, but were even less developed than video systems for advanced television at that time.

The draft that emerged from Study Group 10/1 in 1990 proved to contain a lot of near-prescient "vision" about the future of multichannel audio for HDTV. It contained these fundamental premises (taken from my meeting notes):

• A two-channel stereo sound system has significant limitations, particularly with regard to the larger screen sizes envisioned for HDTV; and with regard to the provision of an improved auditory image for those listeners not located at the ideal, reference listening position;

• Audio signals to accompany HDTV sources should be capable of rendering high quality with a wide range of domestic loudspeaker configurations, including compatibility with two-channel stereo and monophonic listening;

• In considering multi-channel sound, it is desirable to separate the requirements of production, delivery, and presentation, although these three are mutually interactive;

• A universal and compatible multi-channel sound system applicable to both sound (radio) and television broadcasting would benefit the consumer;

The basic requirements of the envisioned multi-channel sound system included these (again, from my meeting notes):

• The sound image in front of the viewer should be maintained within reasonable limits over an area significantly larger than that achievable by two-channel stereo;

• At the reference viewing/listening distance, which is designated as three picture heights from the screen (3H), the difference between the location of a video image and its corresponding sound image should be less than 15 degrees;

• The sensation of spatial reality may be enhanced if a sound image can be located other than in the frontal plane, i.e., sound location to the rear of the viewer should be possible;

• Downward compatibility with sound systems providing fewer sound channels should be maintained. Real-time sound mixing for live broadcast should be practicable;

• Optimal economy should be pursued in all respects, including both costs and transmission bandwidths;

• The sound channels should be digitally coded for transmission, and their ultimate sound quality should be subjectively comparable to that of compact discs;

• Sound signals should not be advanced more than 20 milliseconds or delayed more than 40 milliseconds with respect to displayed picture. This is not a trivial point when both picture and sound are subjected to various forms of digital processing.

SPEAKER PLACEMENT

A key component of the draft recommendation was a hierarchy of loudspeaker arrangements composed of various numbers of loudspeakers and various numbers of audio channels. It recommended that 4, 3, or 2 front loudspeakers be combined with 4, 2, or 0 rear/side loudspeakers. Fig. 1 (below) is the principal reference loudspeaker arrangement as currently found in Rec. 775 [Recommendation ITU-R BS.775-1 (1992-94), Multichannel stereophonic sound system with an without accompanying picture, Geneva, International Telecommunications Union, p. 3].
(click thumbnail)

This shows the reference loudspeaker arrangement for mode 3/2, three loudspeakers in the front, and two on the rear/side. As can be seen, it is recommended that the loudspeakers be placed on the arc of a circle. In those cases where the front speakers must be placed on a straight line, for example, when the center speaker cannot be placed behind the screen, it was recommended that the sound signals be appropriately delayed so that all signals reach the listener's ears simultaneously. It was further recommended that all the front loudspeakers be driven by discrete audio signals. In those cases where the center loudspeaker cannot be placed behind the screen, it must be placed above or below the screen. A variation on the center speaker that appeared in the 1990 document but was not carried through to Rec. 775 is the generation of a phantom center channel by using an additional pair of left and right speakers located at the left and right edges of the viewing screen and fed the mono center channel signal. Other variations include deployment of more than two surround loudspeakers in the 120-150 degree side/rear arc.

At this early stage, it was not at all clear whether HDTV was to be analog or digital, but the early draft document stated that sound signals should be digitally coded for transmission. It is interesting to note that the lip sync recommendation contained in the original document, specifying that sound signals should not be advanced more than 20 milliseconds or delayed more than 40 milliseconds with respect to picture (at the viewing point), were dropped. Audio-video synchronization is addressed in Rec. BT.1359, which affords far more latitude than the above numbers. However, the ATSC Implementation Subcommittee Finding on audio-video synchronization, recently reaffirmed, specifies a synchronization number of +15, -45 milliseconds at the ATSC broadcast encoder inputs; a figure much more in keeping with the SG 10/1 figure of 1990. At this writing, there has been no known use of multi-channel audio in radio broadcasting. Otherwise, most of the 1990 recommendations were incorporated into audio for advanced television in the U.S.