Immersive sound

The HDTV visual experience is so lifelike that any location cue or temporal displacement of sound will not feel or sound right. The spell of an immersive experience will be broken

The doors of aural perception… Binaural: Relating to, having or hearing with two ears. Binaural Recording places a microphone at each ear or using a simulated head that contains microphones in the position of the ears.

Ambiophonics combines psychoacoustic principles with musical performance space design to create audio characteristics that the ear-brain system will accept as real in dedicated home listening rooms.

Robin Miller has presented informative AES papers in 2003 and 2004 about ambiophnic techniques.

Head Related Transfer Function (HTRF), also known as Anatomical Transfer Function (ATF), is defined by Wikipedia as how a sound wave (parameterized as frequency and source location) is filtered by the diffraction and reflection properties of the head, pinna and torso, before the sound reaches the transduction machinery of the eardrum and inner ear (see auditory system).

Digital Signal Processing (DSP) has been used to simulate the binaural effect by treating a mono source signal by applying an HRTF.

Fundamental R & D findings about implementing surround sound using AC-3, have been presented at International Association of Science and Technology for Development(IASTED) conferences. Papers are published by ACTA Press.

"Real-time Implementation of Virtual Surround System using AC-3" describes using Complexity Reduction techniques to ease HRTF processing and enable real-time implementation.

"Virtual Surround System using Ac-3" describes a real-time implementation of 3D sound system with the use of embedded TI C67xx DSP.

An all too popular technique for situation comedies, sports highlight reels and music performances is to “sweeten” audience sounds with audio clips recorded at another instant. This is rarely very convincing. The natural audience sounds at an event have a “feel” to them. They are an integral part of those moments in space and time.

With surround sound used during DTV broadcasts any discrepancy is even more evident and annoying. The HDTV visual experience is so lifelike that any false location cue or temporal displacement of sound will not feel or sound right. The spell of an immersive experience will be broken.

One part gray matter + two parts stimuli

Aural perception is a combination of receiving vibrations in the air at two sites (our ears) and the processing our brain does with this sensory information. Variations in level, phase and time supply information to our brain that is processed to create a spatial sensory consciousness.

Research has been done over the decades to create a stereoscopic sonic experience where sounds, listened to in the “sweet spot,” seem to originate from locations other than the two speakers. To create this sonic illusion, binaural and ambiophonic techniques based on HTRF methods (see side bar) that attempt to model the way the ear and brain process sound have been used. This approach can be thought of as a point (the head) outfitted with 5.1 (6) microphones. Many speaker arrays have taken this approach. Sound is absorbed.

For 5.1 surround sound, an opposite approach may be taken. During audio acquisition, place microphones at the sources of the sound, then let the ears and brain process this stimuli, as it would naturally, to create the surround sensory sensation. This can be thought of as a generative method.

Divide and conquer

Surround sound acquisition can be broken down into broad categories. The first division is live and pre-produced. Live events can be further broken down into studio and remote. In the studio they can be divided into audience (game show, music event) and non-audience (news) shows.

Should the surround sound add to the dramatic effect? Surround sound mixing is open to creative interpretation: reality, simulated reality, dramatic augmented reality or artificial reality. Sound design and sound effects plays a large role.

Let the method fit the content
At the heart of the matter is the issue of how audio should follow video. Does it change with camera angle? Will the audience accept large fluctuations in level and quality? How will a shift from surround to stereo, especially at commercial breaks, go over? Do you fold down 5.1 to stereo with interstitial sound design? Each scenario, sports, concert, studio news and pre-produced, require different production techniques.

For a live event, such as football, the two primary sound sources are the announcers and the crowd. Obviously, the announcer goes in the center channel. A generative technique can be used to position the front and surround “ambiance” mics at four corners of the stadium. With an end zone shot, do you remix the “ambiance” mics to follow the camera? Where do you mix the cheerleaders and the house PA? A cut to an official on the field will leak and result in a higher ambient crowd level. Do you use the LFE to convey a “shaking” stadium?

Musical content 5.1 techniques have been demonstrated at AES conferences. Using a stereo mix as a foundation and placing lead vocals in the center (maybe with some L & R) and then adding audience ambiance to the rear speakers generally produces a pleasant effect. Should the LFE channel be used for bass sounds? Is it appropriate during a close up of the drummer to make the drums louder? What about a creative, experimental piece being used to explore novel use of the 5.1 channels?

Studio news, be it one, two, or three people, can use C, L and R to accentuate the visual spatial relationship of on air talent. What do you mix into the rear and LFE channels? How real do you want your newscasts to be? A parade is happy; a battle is terrifying. Could public opinion be manipulated like commercial advertising influences buying habits through the use of a dramatic presentation of news by influencing emotional response to perceptual stimuli?

A pre-produced dramatic program has more options than a live event. But in a dramatic program of a crowd scene, when a wide shot is used, crowd ambiance is random. But when zooming into a close shot, do you want to “zoom” the audio to hear the character talking? It is a function of the dramatic moment.

Surround technology
Surround micing and processing methods are in the developmental stage. Journeyman techniques, such as M/S, X/Y and OTRF are being adapted (Introduction to Microphone Techniques for 5.1 Surround Sound) for surround production.

The Decca Tree has been used for large-scale recordings and film scoring. An article by Ron Streicher, that was adapted for Mix magazine describes the history and the adaptation of this industry standard technique for 5.1 surround recording.

The Holophone Microphone System hears the way the brain does. It consists of The Elliptical holophone Head, seven Microphone Elements (either Sennheiser or DPA).

Complete systems beginning with microphone arrays and surround processing produce a finished mix in the appropriate format.

SoundField technology is based on the principle that all acoustic events can be represented by four basic elements. These are 'X' which is front/back information (depth), 'Y' which is left/right information (width), 'Z' which is up/down information (height) and 'W' the central point from which the other three elements are referenced. Surround recordings made with SoundField microphones can fold down to stereo — or stereo to mono — without the phase cancellation.

The four outputs from a SoundField microphone system, either direct or recorded are processed to simulate a 5.1 surround microphone array and are routed to the mixing console for surround sound program production.

The Atmos 5.1, Model 9843 is a similar system that uses the Brauner ASM 5 adjustable surround microphone array and a matrixed LFE channel. The Atmos 5.1 Controller processes the mic inputs resulting in a 5.1 surround mix.

“MIXING in the Round” by Mike Sokol, an article that appeared in Electronic Musician, describes how human aesthetic intervention results in a final mix.


Similar to dual aspect ratio video production, 5.1 and stereo compatibility present technical artistic challenges. Should the consumer audio decoder be relied upon for the down mix? Fold down monitoring is essential. Should the consumer audio decoder be relied upon for the down mix?

Should the listener be in control and able to select their placement in the mix be it on the stage, as the conductor, in the orchestra pit or in the balcony.

Bass frequencies require large area speaker diaphragms, yet satellite mini speaker home theaters in a box (HTIB) try to make up for the lack of low frequency capabilities in the speakers by channeling bass to the LFE .1 speaker. What will this do to your mix?

Clearly there is also a lot of creative and technical work to be done to polish 5.1 surround production techniques to high quality broadcasting standards. Methods to produce smooth, consistent, conformed segments from various sources need to be invented and perfected. 5.1 sound design for interstitials can aesthetically bridge the segment segue.

Lots of experimentation needs to be done. The reward is outstanding, differentiating sound and immersive experience that may turn into more loyal viewers. Production values do matter, now more than ever.

Back to the top