While the average home surround-sound system may be easy to operate, few home viewers understand the complex processing required to bring such dynamic audio program into their living room. In the professional broadcast community, things can be quite complicated. As broadcasters make progress in their FCC-mandated transition to DTV, adding AC-3 audio to program streams is proving to be one of the more daunting challenges.
The Euphonix room at Glenwood Place Studios is equipped with a Euphonix System 5 130-input recording/mix console, RADAR II and Pro Tools HD hard disk recorders, and Wavespace custom monitors. Photo courtesy Glenwood Place Studios.
Surround audio requires a new type of audio-production room, and the buildout of rooms with full multichannel surround audio capabilities will greatly impact facility cost and design. Facilities will need multichannel production tools, encoders, decoders, surround-capable mix consoles and multichannel audio monitoring and amplification. Adding these devices greatly increases the overall complexity of newly designed audio rooms. But making the transition successfully isn’t just about buying new gear and implementing technical changes. It also means that facilities must change the way they approach their operations. One of the biggest changes they face is adapting to the new processes required to create multichannel surround-audio content.
Since an audio mixing room might be used for multiple shows, mix rooms need to be more flexible. In the past, a mix room that would be used for a variety of show content would effectively operate in a similar manner regardless of the program content being created. The future DTV model, however, requires rooms that can operate as mono or stereo channel rooms for news, and stereo or surround for other program content.
There are many different multichannel surround-audio formats in the film community with which to contend, including Dolby Digital, Dolby Digital Surround EX, DTS, DTS ES and Sony SDDS. In the broadcast community, this issue has been somewhat alleviated by the ATSC’s adoption of the AC-3 standard for creating and distributing multichannel surround audio.
The issues currently facing broadcasters in implementing surround audio in the DTV transport stream present challenges that, in many circumstances, require them to apply new types of tools. Two such tools are available from Dolby Labs. The DP570 multichannel audio tool, is a device that, during the encoding process, gives the operator the ability to generate and manipulate metadata values, which have begun to play a larger role in the DTV world.
Control Room A at Egan Media Productions features a 64-input D&R Cinemix 5.1 console with moving fader automation, onboard dynamics, and 24-input STEMS film mixing module. Photo courtesy Egan Media.
The unit also functions as a monitoring decoder, allowing operators to monitor the effects that the metadata values will have on the signal when decoded by the consumer. This allows the operator to evaluate, in real time, how downmixing or metadata changes the signal.
The DM100 is another tool available to engineers who install and/or troubleshoot a system. This device monitors the PCM, Dolby Digital and Dolby E bit streams, and allows engineers to analyze bit stream errors in real time. It also allows the user to generate PCM, Dolby Digital and Dolby E test bit streams.
Monitoring the surround mix
One of the biggest changes to which broadcasters must adapt is the process of creating multichannel surround-audio content. They must monitor the encode process in real time so that the mix maintains compatibility when the signal is decoded. The mixer must decode the surround signal prior to routing the signals to amplifiers and speakers.
Creating a surround mix for a theatrical presentation is, in some ways, a simpler task than that facing the broadcast mixers. This is because theatrical production requires several mix operators, each concerned specifically with one aspect of the surround mix — whether it is music, effects or dialogue. And theatrical presentations are always decoded to a multichannel surround listening environment. But this is not necessarily true for broadcast production. It is conceivable that additional mix operators could be added, but, in most cases, this is not a very likely scenario. Since, potentially, the signal is delivered to the consumer in a wide range of listening environments — from mono to full 5.1 surround — the operator must monitor the mix in these different listening environments to assure signal compatibility.
Downmixing is the process of taking a multichannel program signal and reproducing it in listening environments that have fewer speaker channels than that for which the original surround program material was created. This process assures compatibility of the program material when it is decoded in the consumer’s home system speakers. But it raises the potential problem of having a room with multiple sets of speakers that might be used in different monitoring environments.
Figure 1. This diagram shows the response of the compression control on Dolby’s DP570 multichannel audio tool. The feature allows the listener to adjust the extremes of the program audio.
Monitoring a downmixed signal in stereo could be done two ways: using surround speakers or the main stereo speakers. But, what further complicates this issue is the fact that a program that sounds correct in a surround format may not sound correct when monitored as a stereo or mono signal. Phase cancellations or other phasing issues could arise when the surround signal is downmixed to stereo or mono.
One way to alleviate this problem is to downmix all the program content that will ultimately be delivered as mono to a left only/right only (LO/RO) stereo pair and not a left total/right total (LT/RT) stereo pair. The operator must then decide which mix should be optimized for surround, stereo or mono. Of course, it would make the operator’s life easier if every consumer had 5.1 surround audio. But that is not the case — at least not yet.
As content becomes more varied, audio production rooms will become more specialized than in the past. It will be up to individual stations to decide if they want to opt for full surround capabilities in each audio production room.
Processing delays, which in the past had been almost exclusively a video issue, will soon become an audio issue as well. In the past, delay to the audio operator meant that an audio-delay device was needed to bring the audio back in lip sync with the video after some video effect was rendered.
The Dolby E and Dolby Digital encoders and decoders all have processing delay through them. One of the menu options in the encoding process of the DP569 Dolby Digital encoder allows the unit to multiplex SMPTE timecode with the bit stream, thus effectively “time-stamping” the bit stream. This time-stamping of each Dolby Digital audio frame allows the audio bit stream to be synced back up to the video signal. The current revision of the firmware on the Dolby Digital encoder supports the optional delay word, word 8, in the SMPTE 339M standard. By enabling this feature, the encoder can input the exact value for the processing delay into this delay word. Any downstream devices that support the SMPTE 339M standard, and this feature in particular, can read this word and know the exact value of the audio delay.
The Dolby E encoder and decoder each have a one-frame delay. To distribute the Dolby E signal within the facility, the facility must somehow compensate for this delay. A video-delay unit inserted after either the encoder or decoder may be the quickest and most straightforward method. It is also possible to advance-time the PCM audio by two frames prior to the Dolby E encode and decode process, thus synchronizing the decoded Dolby E signal and PCM audio. The downside to this is that if the signal is left as an encoded Dolby E signal, it will be advanced one frame, and will have to be delayed. There are HD VTRs currently on the market that have video-processing delays of one frame while the signal undergoes encoding or decoding to the SDI SMPTE 259M signal standard. Examples of such VTRs are the Panasonic HD D5 and the Sony HDCAM VTRs.
The Dolby Digital encoder has a variable delay that is dependent on the operating mode, bit rates and other user-definable parameters. The minimum delay on the unit is 187 ms (about 5.5 frames), to a maximum delay of 450 ms (about 13.5 frames). As with the Dolby E delay, there are a few ways to compensate for this processing delay. Some MPEG encoders used to deliver the final program transport stream can compensate for the processing delay of the Dolby Digital encoder.
The Digicipher II (DC II) from GI/Motorola can compensate for the delay as long as both the Dolby Digital and DC II encoders are fed VITC or LTC timecode. Future implementations of the DC II encoder will support the delay-word option of the SMPTE 339M standard. This will allow the encoder to look directly at the value of the delay word and know how much the delay value is. Manufacturers of other MPEG encoder compensate for the audio processing delay through the user interface to the encoder. For example, Scientific Atlanta’s Powervu encoders allow the user to manually set the static value for the amount of audio processing delay compensation that the encoder must accommodate. All encoders can compensate for the audio processing delays if the encoder is fed with PCM audio and the AC-3 audio signal is encoded within the encoder. To monitor the Dolby Digital decoder in a master control operation, the room requires a one-frame video delay to compensate for the audio processing delay.
Managing the metadata
Metadata has become another hot issue of the DTV standard. As it pertains to surround-audio encoding, there are many parameters that require attention, including dialogue level, channel modes, data rates, dynamic range control, downmix modes and various other bit stream parameters. Three metadata parameters in particular are of interest in either the Dolby E or Dolby Digital signal: dialogue normalization, dynamic-range compression and downmixing.
The metadata values can either be generated from the Dolby Digital or Dolby E encoders, or the values can be preset from the DP570 multichannel audio tool. The DP570 has several features that offer the operator more control over generating and manipulating the metadata, and it allows the operator to monitor the effect of the metadata on the signal in real time.
Since it has a decoder and monitor matrix, it allows the operator to monitor the effects of metadata and downmix changes prior to encoding in either Dolby Digital or Dolby E formats.
Dialogue normalization, or “dialnorm,” allows the operator to set up this parameter so that all program material maintains a consistent relative volume level upon delivery to the consumer. For television post production, this level matching allows varied program content (commercials, news, sitcoms, sports, movies) to maintain the same level when decoded in the consumer’s home. This sets the null level for the consumer’s decoder, giving a pseudo limiting effect that minimizes clipping.
The dynamic range control, or “dynrng,” allows the consumer to reduce a program’s dynamic range as needed. This is an optional mode that can be disabled on most consumer decoders if so desired. It allows the viewer to adjust the extremes of the program audio and listen to it at a reduced dynamic range. (See Figure 1.)
Having a facility that is designed to support the Dolby Digital and/or Dolby E format is not just a matter of technical or engineering issues. The facility must change operational procedures to be fully compliant. As with any new standards or implementations, more often than not the full nature of the issues or problems are not discovered until the actual implementation is under way. Over the next few years, as this standard becomes the norm and not the exception, the transition from an analog or PCM audio facility to a multichannel surround facility should become easier.
John Holt is a senior systems engineer with The Systems Group.