Designer Sound—It’s in the Mix

22.2 audio control room in OB van (Image credit: Getty)

Immersive sound for broadcasting is all about spatial orientation, sonic separation and hoping that the listener can reproduce some dimensional sound through speakers or headphones. Clearly the sound mixer can’t control the sound ecosystem or how the consumer listens to content, but it seems that immersive sound reproduction is a moving target and certainly creating a compelling sound mix that may well be heard over speakers, soundbars and headphones is no easy task.

Horizontal Layers
The problems often begin with overly conservative sound design, but as sound practitioners we must start somewhere and build forward. Typically, sound design looks at audio as horizontal layers—for example, immersive sound proposes an ear-level horizontal layer of sound and a stratus of sound above the head of the listener. NHK, the national broadcaster in Japan, has even recommended that there should be sound projected from below ear level.

When immersive sound was initially contemplated and tested it was reasoned that these above sounds should be captured and placed into the upper sound zones and reproduced by the height speakers. Even basic sound design principles would suggest that the sound mix should be what you would hear as if you were there. Horizontal layers of sound are fine for basic immersive sound production and academic consideration, but if the “real” sounds above the spectator’s head are diffused atmosphere and excessive PA noise, then not much of these sounds are particularly constructive to an interesting immersive soundfield/mix.

Further I would argue that too much atmosphere in front of the listener is boring, fatiguing and impacts speech intelligibility and is one reason why the immersive sound mixer should consider alternative mixing designs. For example, most soundbars project the sound from in front of the listener and I believe that this should be a major consideration for the sound designer/mixer.

Since most immersive soundbars project frontward, upward and sideways then the question is—are up-firing soundbars projecting more atmosphere in front of the listener, thus diluting the mix? This could be the case if the front left and right height channel are used primarily for ambiance and atmosphere.

Basketball, For Example
Consider that if the front height channels are projecting sounds from the front and upwards, then it makes sense to use the front height channels to reinforce the screen action, dialogue and ancillary audio. Front Soundfield Reinforcement (FSR) is a concept that I have promoted since a presentation I made at Germany’s Tonmeister conference in 2016.

FSR basically supports using the complete front vertical soundfield—left and right channels along with the left and right height channels to mix a front “soundframe” around the visual presentation. For example, basketball has a clear top frame with the net and hoop and bottom frame with the floor. By placing some sounds in the vertical perspective this design helps to elevate a 2D image into a 3D soundspace.

Testing of basic immersive sound production began in 2014 with NHK in Japan and later in 2016 with NBC testing various microphone schemes at the Olympics and American football. In 2016, Seoul Broadcasting System adopted ATSC 3.0 and used the MPEG-H encoder to transmit the 2018 World Cup into the homes of Korea.

Audio — ORTF 3D microphone array (Image credit: Dennis Baxter)

The 2018 World Cup was probably the first major sporting event that made available an immersive sound mix with the addition of an overhead atmosphere layer constructed from Schoeps microphones. Felix Krückels was the sound designer and worked with Helmut Wittek on an immersive array, the ORTF 3D, which was described as a “double-layered ORTF using eight microphone capsules.”

Capturing immersive sound with ambisonic or 3D array-type microphones creates a stable dimensional foundation, but is not necessary to create an immersive experience, particularly with sports sound.

For example, according to NBC Sports’ Karl Malone, director, sound design, NBC has created immersive sound at Notre Dame football games for several seasons and uses a variety of mono and stereo microphones to create immersive sound. Simple immersive sound production does not require 3D microphones or 3D mixing busses, however virtually all mixing consoles used by the broadcaster—Lawo, Calrec and SSL—have 3D panners, which easily allow for spotting/placing the sound elements around the intended listener.

What 3D sound does need is more adventurous sound designs and a good understanding of encoding and how decoding/rendering affects the sound design. In my basketball example I suggest a pair of lavaliere microphones to capture the backboard and net swish and to route these sounds as audio elements/objects to be rendered into the front left and right height channels.

An experienced sound practitioner might ask, “What happens to these above sounds in a stereo or surround mix?” Interoperability between formats has been a problem, but digital sound, metadata and rendering solved some of the conversion issues between formats. Dolby Atmos, Auro 3D, DTS:X and Fraunhofer MPEG-H are examples of encoders for distribution of immersive sound; and when the audio reproduction device detects a particular codex it will renderer the proper balance and spatialization appropriate for the reproduction device and the number of transducers. For example, with a Dolby Atmos enabled up-firing soundbar it should project the net sounds above the listener.

The wide range of reproduction devices does not guarantee similar sound experience, but I would argue that just about anything is better than rear-mounted television speakers.

Hope is not a strategy; successful immersive sound is in the mix.