Latest from Tv Technology in Jay-yeary

Our Digitized Audio Future

Jay Yeary — Tue, 11 Dec 2018 20:52:19 +0000

Audio technology has changed faces many times during the decades I’ve been in the industry. However, the biggest change by far was the transition from analog to digital, and once audio was turned into bits and bytes it enabled us to do things that seemed inconceivable in the past. Not only do we no longer record linearly, most of us no longer do anything in a linear fashion, with random access now being the way we record, work, and live our lives.

We’re currently undergoing another monumental transition in the audio industry, when audio technology can enhance user experience and drive quality forward or increase individual isolation, with instantaneous delivery often done at the expense of quality. The following technologies are all linchpins in our audio future and how we use or abuse them will determine the course of professional broadcast audio production.

MACHINE LEARNING

AI-assisted mastering in Ozone

Most discussions of artificial intelligence seem to focus on robots taking jobs from humans while also performing all of our mundane household chores, leaving us simultaneously unemployed and free to get in some really serious binge watching. AI already has the ability to analyze and learn from itself, a process called “machine learning,” and this has helped create some powerful tools for audio production.

It is a brilliant use of computing power to have processes analyze themselves and their outputs or, in our case, analyze what we’re doing and offer suggestions to improve how and what we do. Anyone who mixes using presets is already relying on someone else’s idea of a good starting place, but machine learning takes this concept to the next level by analyzing in real time rather than just giving us a static starting point.

I was among the skeptics when it came to what are often touted as “automatic mix” tools until I discovered they are very useful for analysis, similar to using console meters in spectrum mode during mixes to monitor the overall tonal balance. These tools can act as a second set of ears listening for you, and that is almost always a positive thing. These machine learning audio tools are perfect for handling mundane tasks such as compliance and correction, which means they may one day free us from managing loudness and monitoring immersive audio down mixes.

IMMERSIVE AUDIO TOOLS

Immersive audio is the most exciting development for mix engineers since 5.1 surround, and it is the technology that makes me want to get back into daily audio production work. It seems like every month there is a new surround microphone coming to market to help create immersive content.

An immersive panner in Nuendo

The possibilities for creativity and error with immersive audio are immense. Fortunately, there are now more tools than ever to help keep things from getting out of hand. Multichannel meters and panners that work across immersive busses are being released in software as well as in hardware form. These tools are accessible to just about everyone now, to the point that even the two primary workstations in my home studio, Pro Tools and Nuendo, include immersive panning, bussing, monitoring and authoring tool connectivity.

VIRTUAL AND AUGMENTED REALITY

Before 4K and HDR, 3D was the next big thing. Now the next big thing is supposed to be virtual reality. Slip on a VR visor and closed ear headphones and you become part of another world, that is, if the visuals are engaging enough to draw you in and the soundscape is enveloping and convincing.

This technology may be impressive, but a more intriguing alternative is augmented reality, where an enhanced version of the real world is presented through the screens of our devices. AR may actually present more challenges for sound designers than VR because many sounds need to be organic while others, such as audible cues, often work better when they sound totally artificial.

Since AR is viewed through portable device screens instead of visors, soundscapes are presented in virtualized surround on earbuds of dubious quality or through tiny device speakers. The challenge for audio professionals is to somehow make the augmented audio experience truly immersive and keep the viewer engaged when they can easily peer beyond their screen back to the real world.

THE COST OF CONVENIENCE

The era of personalized audio began with the introduction of the Sony Walkman in 1979 and it may finally have reached critical mass. We now find ourselves at the point where there is more individualized media consumption taking place than at any other time in history, yet very little of that media is the highest quality available.

Delivering digitized content using limited bandwidth requires widespread bit reduction to the point that it can be almost impossible to find digitally uncompressed content at the consumer end of the chain.

We’re balancing on the precipice of a very perilous chasm between convenience and quality. It may soon be the case that the only way to experience true uncompressed audio is to attend an unreinforced acoustic performance; or perhaps we’ll just have to start listening to the people around us.

Now more than ever, the goal of every audio engineer should be to create the best sounding, highest-quality audio imaginable every time we have the opportunity. We must strive to make people crave quality audio every time they listen or they will stop caring. We must use the incredible tools we now have at our disposal to create truly exciting, immersive and engaging audio for every single project, no matter where or on what device it will be heard.

In closing, this is my final Inside Audio column and I wish to thank everyone who has taken the time to read the column since I started filling it with words back in 2013. A very special thank you to Tom Butts for granting me this opportunity and to Terry Scutt for transforming my cogitations into sentences fit for print. Each column was an incredible learning experience and I can’t wait to read what Dennis Baxter will be sharing in this space in the future. His insights come from the leading-edge work he is doing in the very areas our industry is moving toward.

Jay Yeary is an audio engineer who has had the good fortune to spend his entire career in broadcasting. He is a member of AES, SBE, SMPTE, and TAB. He can be contacted throughTV Technologymagazine or attransientaudiolabs.com.

Immersive Content and VR for Audio at Avid Connect 2018

Jay Yeary — Fri, 29 Jun 2018 00:45:10 +0000

Since 2013, my first NAB Show stop has been at Avid Connect. This year I was extremely intrigued to see how the event would differ from previous years, since Jeff Rosica had taken the helm of the company following the still-fresh ouster of the previous CEO — who originated the event — along with the Avid Customer Association. It didn’t take long to find out.

Avid Connect has always tended to be product-heavy from start to finish, but this one simply did not feel that way. There was plenty of discussion of Media Composer, Pro Tools and the Media Central Platform — Avid’s business is to make and sell products after all — but the overall event, and the breakout sessions in particular, were far more interesting and educational than prior years, covering a wide range of topics that were not specific to the company and its products.

TECH TRENDS

One session that was surprisingly engrossing, “Technology Trends Musicians Can Utilize for Tracking Royalties.” It covered, among other topics, the pros and cons of streaming.

Panelist Jordan “DJ Swivel” Young made the case that, for those who have a dedicated fan base, streaming can actually be a greater source of revenue than CD or song purchases because the artist gets paid every time the song is played, not just when a purchase is made.

“Technology Trends” panel from left: Ray Thompson (moderator), Jordan “DJ Swivel” Young, Henry Frecon, Benji Rogers.

A different technological solution to the issue was presented by Benji Rogers, CEO of Dot Blockchain Media, who claims that current audio file formats are antiquated — WAV is 27 years old; MP3 is 24; and FLAC is 17 — and are part of the problem because they don’t contain persistent metadata.

Rogers proposes a new format and wrapper with blockchain as part of the fundamental framework so that ownership can be tracked from inception through all iterations, sampling and reuse.

Whether or not you’re an artist trying to get paid, the ability to encode persistent, unmodifiable metadata will soon become an essential element of every single piece of media as the ability to replace original audio and video content with completely believable digital substitutes becomes even more mainstream than it already is.

There were several sessions on immersive content and VR, both key elements of Next Gen Audio, and these sessions were far more informative than any I attended at NAB.

“Creating Spectacular Experiences with Dolby Atmos” panel from left: Curt Behlmer (moderator), Ron Bartlett, Doug Hemphill, Tim Hoogenakker.

The panel of “Creating Spectacular Experiences With Dolby Atmos” included re-recording engineers Doug Hemphill, Ron Bartlett, and Tim Hoogenakker. During the discussion on mixing immersive content, Hemphill pointed out that as sound density increases, imaging and intelligibility decrease, so films with busier soundscapes are more difficult to mix and it becomes harder to make individual sounds stand out. In fact, he sometimes pulls back the density of the sound mix when there are really intense visuals to give people’s brains a break.

When asked what it was like to be able to place sound in so many locations, Bartlett said that they went a little overboard with the overhead speakers at first, but realized that this actually made the film sound more mono and less immersive, so more judicious use of the overheads is now employed.

[Read: 2018 Audio Prognostication—What Lies Ahead?]

Everyone felt that the soundscape in Atmos is more akin to listening in the natural world than standard surround mixes, though every so often the choices made in the surround mix are the right ones compared to the Atmos mix.

While they do verify downmixes as they mix, Hoogenakker said that he specifically checks television downmixes on a soundbar since more people than ever are listening on them at home.

Something that helps keep costs down while speeding up workflow is being able to work in 7.1.4 in smaller rooms to prepare audio for the Atmos mix room. Hemphill brought us all back to reality by reminding us that even though the ability to create amazing immersive soundscapes is now in everyone’s grasp, dialog remains king. Always.

CHALLENGES OF AUDIO IN VR

In the first session on audio for VR, “Techniques for Mixing Audio in VR Content,” the panelists covered aspects of VR production I had not even contemplated. Benedict Green of Ecco VR stressed the importance of preproduction and the fact that, when shooting content, the crew, recording equipment, cables and anything else that might take people out of the experience must be concealed, which is a tad difficult when shooting live actors while trying to capture and playback audio.

Avid Connect has always tended to be product-heavy from start to finish, but this one simply did not feel that way.

Green and Varuna Nair of Facebook both noted that ambisonics is the preferred format for VR audio and there are different tools for working with different order ambisonics.

They also discussed some of the challenges of working with audio for VR, including never being able to use a boom mic on set; and the difficulty of using ambisonics microphones because they cannot move without changing the soundfield. Managing loudness, and keeping audio from clipping is a huge challenge, as can working with nonaudio people who may reorder or mangle the ambisonics tracks. According to Nair, when thinking about VR, audio engineers should “think of it as another tool in the toolbox.”

In the session “Creating a Compelling Immersive Mix for VR Content,” Scott Gershin made the case that realism is not always the best approach, since the purpose of sound is to help tell the story in whatever way necessary.

Like Green and Nair, he also mentioned things I had not really considered. For instance, if you move a sound in the soundscape, people will pay attention to it, so don’t move it if you don’t want it noticed. Gershin does not move backgrounds, but he does group and move all sounds for given characters together to the same place in the soundscape because that is where the character exists at that moment.

Since we don’t really hear things behind us he feels that a 270-degree soundfield is wide enough. As for where this is all heading, Gershin thinks that VR will be commonplace in five years and boring in 10, but augmented reality is really where we’re heading anyway.

Jay Yeary is a television engineer who specializes in audio. He is a member of AES, SBE, SMPTE and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.

[Want more information like this? Subscribe to our newsletter and get it delivered right to your inbox.]

The New Reality of Wireless Frequency Coordination

Jay Yeary — Mon, 02 Apr 2018 14:31:04 +0000

The 600 MHz wireless spectrum auction has come and gone, taking some television stations off the air, forcing more than a thousand others to relocate, and causing repercussions that are starting to trickle all the way down to wireless device users. It was already apparent that owners of equipment operating in the auctioned spectrum would need to replace or reband their devices in order to avoid interfering with the mobile devices that will soon be operating there, but many were planning to put this off as long as possible.

With announcements coming from the auction winners of accelerated rollouts of 600 MHz mobile devices, there is no longer any time to waste. Wireless device issues are already on the increase as new devices are lit up and tested, with users puzzled as to why their previously fine wireless systems are suddenly giving them trouble. Though this has resulted in problems for wireless users, it has, perhaps inadvertently, created a place for engineers who can advise customers on their wireless issues, especially if they are able to perform wireless frequency coordination. Some of these calls have come my way, which caused me to take a fresh look at the problem and the newly available tools to help us deal with these issues.

INTERMODULATION DISTORTION

At its most basic, coordinating the frequencies of wireless devices is as simple as ensuring that all transmitters are operating on different fundamental frequencies, but of course, things are never that simple. Those fundamental frequencies have harmonics with enough strength through at least the fifth order that they can interfere with the frequencies of other nearby transmitters, resulting in intermodulation distortion. This typically manifests itself through popping or other unwanted noises that emanate from the receiver, usually at the worst possible time.

[Read: FCC To Hone Wireless Mic Rules]

Calculating the harmonics of fundamental frequencies is easy, with the second harmonic double that of the fundamental, the third harmonic triple, and so on. For instance, a fundamental frequency of 626.200 MHz has a secondary of 1252.400 MHz (1.2524 GHz), and a third of 1876.600 MHz (1.8766 GHz). A single transmitter operating on its own in an area of clear spectrum will have no problem, but as soon as a second transmitter is powered up things get complicated, and the complexity increases with each transmitter added to the system. As soon as more than one transmitter is online we must determine whether the frequencies of each will interfere with the other, and the best way to do this is by using frequency coordination software made specifically for this purpose.

However, before we can coordinate the frequencies of the devices we know about, we need to figure out if other devices might also cause interference, a process done by performing a frequency scan of the area using an RF spectrum analyzer. These devices monitor and display RF spectrum activity over a specified frequency range. After running the scan, the analyzer generates a report which gets imported into the frequency coordination software along with the data from our known devices.

Most frequency coordination software also includes the ability to pull in data for local television stations, which needs to be factored into the intermodulation calculations along with all the other data we’ve gathered. Once all the data is loaded, the software calculates any conflicts and suggests changes to the fundamental frequencies of the transmitters in order to get all the devices to play well together.

THE AUTO-SETUP OPTION

It is theoretically possible to eliminate almost all of this work by using the auto-setup features included with some wireless systems, since it polls the wireless spectrum and automatically assigns the best frequencies for all transmitters in the system. Unfortunately, auto-setup typically requires that all devices be from the same manufacturer and be banded in the same frequency range. Most real-world audio systems contain devices from multiple manufacturers with transmitters banded across multiple ranges.

Building a system of wireless microphones, in ear monitors, and wireless intercom stations from a single manufacturer, that meets end user requirements, can problematic if not impossible. Still, auto-setup solutions are an excellent option for simple systems, or systems engineered this way from the outset, but most systems which have evolved over time are far more complicated than auto-setup can handle. Still, anyone who has decided that now is the time to replace their transmitters should consider new systems that are frequency-agile with the ability to manage frequency coordination without additional tools.

The good news for anyone given the task of wireless frequency coordination is that RF analyzers and coordination software tools are more widely available and affordable than ever before. RF analyzers now cost from tens of thousands down to hundreds of dollars and coordination software from hundreds to tens of dollars. The precision of the analyzers and comprehensiveness of the software is what sets them apart, with the more expensive options offering more features and generally making life easier by reducing the amount of work required.

Shure Wireless Workbench

Back when I worked for a large media company, our coordination work was done with a very high-end Anritsu analyzer coupled with the Intermodulation Analysis System (IAS) software from Professional Wireless. These days my toolset is far more modest, with an RF Explorer analyzer, Touchstone analyzer software for RF capture, and Shure Wireless Workbench software for coordination. Using Wireless Workbench with devices from other manufacturers is not a problem, but it requires building device profiles for those not included. The process itself is much as described earlier—connect the analyzer, gather RF data, pull all data into the coordination software, run the analysis, then make any necessary frequency changes to the transmitters. It doesn’t hurt to run another coordination once things are changed, just to double-check that everything is working without conflict.

While wireless coordination will most likely be needed for 600 MHz devices, it may also be necessary for 470–512 MHz devices if you happen to be in one of 11 large US metro areas, because that spectrum has been designated for public safety (T-Band) land mobile radio use.

Jay Yeary is a television engineer who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology magazine or at transientaudiolabs.com.

TV Tech Experts Preview the 2018 NAB Show

TV Technology Staff — Mon, 19 Mar 2018 14:29:06 +0000

No one goes to an event like the annual NAB Show without doing their homework first. As the show expands its umbrella to cover more advanced media technologies, the need to prepare becomes ever more crucial. Whatever your taste, TV Tech’s experts are here to help; here’s their advice for 2018:

KARL PAULSENStorage Technology

Standout trends will likely center on evolving workflows in cloud-based solutions and emerging applications for IP infrastructures. This is the first NAB since the adoption of new SMPTE ST 2110 standards for Professional Media Networks so don’t miss the IP Showcase (in the rear of Central Hall) where working examples of the new standards plus integration of the NMOS interface specifications will be shown in an educational showcase environment. Potential IP adopters will be looking at how manufacturers address software defined networking and new tools aimed at diagnostics and operational management for IP implementations.

The enormous prominence of virtual and augmented reality and artificial intelligence and machine learning at the Consumer Electronics Show will be evident at NAB. Expect to see evolving products necessary to support the industry’s mandate to create, manage and deliver content to these emerging platforms. eSports is now attracting inventive players and changing production techniques that may show promise for aspiring venues.

The continual industry churn of what is now Belden’s growing empire should attract users to see what new products they collectively offer. Everyone is curious how the combined companies of Grass Valley and Snell Advance Media will address the changes in infrastructures that appear to be moving away from pure hardware and into virtualized, software-based environments. We’ll see what comes out of these mergers and acquisitions—and who will be next in line.

JULIA SWAINLighting Technology

The 2018 NAB Show promises to be a big one! Excited to see more lighting units capable of RGB and DMX, which so many of us have been utilizing more and more on set. Being able to move so quickly between colors and qualities of light has opened up a lot of possibilities. I’m very much hoping for LEDs with great outputs as well. The climb toward stronger, more versatile LED units has been an exciting and consistent one, so I’m looking forward to seeing what this year brings in the world of lighting.

On the camera side, I anticipate some new monitor options with a gamut of exposure tools. Lots of apps to control and learn camera settings are also on the horizon.

AL KOVALICKCloudspotters Journal

Look for all things cloud including SaaS apps for your daily operations. Don’t settle for installed apps unless there is a performance need. Ask vendors what their cloud strategy is, including what clouds they support for media services, apps and processing. Go to NAB with a list of “cloud questions” for your preferred vendors specifically around hybrid cloud local operations integrated with cloud services. Understand there is a place for local services but these are being eclipsed by cloud operations. Understand what mix will work best for your facility. Expect to use one or more clouds to meet your business needs. This multicloud approach will give you more flexibility for business operations. Look for 24x7 cloud support and operational services possibly from specialty companies.

JAY YEARYFocus On Audio

This is the year where we really start to grasp the full scope of the changes that IP-enabled technologies are bringing to television, from ingest all the way to delivery. HD-SDI video and discrete audio chains will see fewer implementations as they are passed over for IP-based alternatives, even though the road to an all-IP facility remains a bumpy one. With ATSC 3.0 now rolling out in the U.S., IP is now a reality for new and remodeled television facilities.

In audio, we’ll certainly see an increasing number of personalization options for consumers, along with products for immersive audio that are designed to be shoehorned into residential environments. At least for now however, it looks like personalization, whether mono, stereo, or emulated surround, has piqued the interest of the end user more than additional surround channels in the living room. This could change if the costs of immersive audio products for the consumer become a little more accessible. The preference for personalization is partly VR-driven but is really a continued outgrowth of the de-cades-old personal device boom—which is likely to continue with or without a VR element.

User interfaces for Next Gen technologies are particularly worthy of scrutiny this year, since presenting complicated options in an easy-to-understand package is an art form that will make the difference between success and failure for some products. Finally, anyone hoping to stretch out their use of 600 MHz wireless devices appears to be out of luck now that T-Mobile has accelerated their rollout.

CRAIG JOHNSTONCorrespondent

We’ll see the rollout of ready to use 4K and some 8K camera systems. These new cameras have spawned attendant equipment such as enhanced lenses and high bit rate signal transmission equipment.

A whole host of 360 degree virtual reality camera systems and stitching software will be presented. An Immersive Storytelling Pavilion will help newcomers to 360 degree technology figure out how it will fit into their business.

Look for cellular liveshot gear that is futureproofed by including 5G capabilities, even though 5G at present is a small blip on the cellular radar.

And speaking of futureproofing, a lot of black boxes being bought at the show will have IP connectors on them, even though they will initially be connected via coaxial and fiber optic cable.

2018 Audio Prognostication—What Lies Ahead?

Jay Yeary — Thu, 18 Jan 2018 12:00:00 +0000

The beginning of a new year is an excellent time to reflect on life and ponder mysteries—such as why we make a big deal out of one year starting as another ends, and whether time is just an imaginary construct in the first place. Of course, it’s also a good time to look back at the tumultuous year that just passed and consider what the new year will bring.

Next-generation audio’s biggest impact will be realized through earbuds and headphones.

NEXT GEN AUDIO
Next Generation Audio was top of the list of audio revolutions that made headlines in 2017 while making almost no difference to the end user. That will change in 2018 as ATSC 3.0 begins to roll out to consumers and the age of immersive and personalized television audio begins.

The greatest challenge for the companies tasked with delivering NGA technology will be to make the interface user-friendly and as brain-dead-easy as television is supposed to be. If they deliver real usability to the masses we could see a serious resurgence of interest in broadcast television, and if they don’t, people will simply turn back to their streaming boxes.

As for the technical side of NGA, it would be wonderful to see immersive audio transform multitudes of living rooms into surround environments, but its greater impact will almost certainly be realized through earbuds and headphones. People already love their individual listening devices and immersive audio will deliver a very personal and important television audio experience.

STREAMING TV AUDIO
It will be interesting to see how things go in the world of streaming television audio this year, especially since streaming has amazing potential as a delivery platform for personal immersive audio. The AGOTTVS group within AES has done a fantastic job of setting guidelines for content creators and streaming providers to follow, but whether those guidelines will be followed across the board remains to be seen.

When Congress mandated loudness management through the CALM Act, it had little impact in Silicon Valley, but if streaming audio loudness problems aren’t solved voluntarily, it is entirely conceivable that similar legislation could be put in place for streaming.

AOIP
Audio and video-over-IP will continue to grow this year as television facilities add more IP- and IT-based equipment in place of traditional equipment. As this happens, television facilities will begin to look a lot more like IT facilities.

As the level of IP-based equipment increases so does the need for engineers with IT skills while the need for traditional television engineers decreases, something everyone working in television engineering needs to keep in mind.

At the same time, SMPTE, AES and other professional organizations continue to develop and refine standards to make the myriad television IP standards function together and interoperate.

IMPACT OF FCC CHANGES
The FCC was certainly busy last year, and at times their decisions didn’t seem terribly pro-television. On Dec. 14, 2017 the FCC reversed 2015 Title II Order, which deemed the internet to be a public utility, ending what is widely known as “Net Neutrality.” Prior to the 2015 order, internet providers were actively throttling traffic of various types.

Now that the order has been reversed we’ll have to wait and see what impact, if any, this decision will have on our ability to successfully deliver streaming television content to customers. Of course, 2017 also brought the reality of the wireless frequency repack and sell-off of spectrum, which displaced nearly 1,000 stations and caused chaos and confusion throughout the industry.

In the world of wireless microphones, belt packs and IFBs, 600 MHz spectrum is already gone, with T-Mobile announcing plans to start utilizing those frequencies at any time.

Between the loss and shuffling of spectrum and the possibility of throttled streaming, it feels like a very bad time to be in the business of delivering television content.

AUGMENTED REALITY
Virtual reality was quite a fixation in 2015 and 2016, and it hasn’t really gone anywhere. But a technology that is far more interesting, which will make big waves this year, is augmented reality. AR doesn’t replace the world around us with something different as VR does; it adds dimension to the world we’re in.

This technology has actually been around for years—there were AR apps for older smartphones, and Pokemon Go is an extreme example of the potential of AR technology.

Augmented reality may prove to be an enormous benefit to healthcare since doctors can have medical information immediately at hand using goggles or glasses, but the potential for augmented television content is there as well.

With consumers already watching content on smart devices, AR is a technology just begging to be utilized. For audio professionals, this area is especially intriguing because AR and VR experiences are hollow without an audio soundscape to accompany the visuals.

iZotope’s Neutron track assistant

MACHINE LEARNING
Artificial intelligence was the subject of many, many discussions in 2017 and it will continue to be in 2018 as people worry that robots will become sentient beings and take over. Machine learning, on the other hand, sounds far less ominous so we’ll use that term instead.

Machine learning can best be described as building a machine that can learn and then letting it do so. Google’s language translation AI is a perfect example of machine learning because it has created its own internal language in order to translate languages quicker.

The fruits of machine learning have already started to appear in some commercial audio products, but there is a far deeper movement using it for all kinds of audio work. Acoustics, signal processing, language and signal analysis, music classification, signal and source separation are just some of the areas where machine learning is already being applied.

My first inkling that machine learning was in use for audio software was when iZotope released Neutron, their mix analysis and processing tool. Experiments with Neutron’s track assistant have caused me to rethink some final mixes. Machine learning has made its way into several iZotope products and they’re not the only audio company using it. No matter how you feel about it, machine learning is here to stay, so expect to see more of it in 2018.

Jay Yeary is a television engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.

Guidelines for Streaming Loudness

Jay Yeary — Thu, 21 Dec 2017 12:00:00 +0000

As mature as the television industry is, it appears to be going through a very disruptive identity crisis right now. Cable providers are hemorrhaging subscribers while over-the-air broadcasting is being rediscovered; broadcasters and content creators are rolling out their own streaming services to ensure they have direct access to customers without the need for third parties; the journalistic integrity of traditional news organizations is under attack while tech companies deliver uncorroborated information to users while denying any responsibility for determining its credibility; and viewership of traditional sports broadcasts seems to be waning while interest in eSports is on the rise.

Yet despite this changing content landscape, viewing of television content seems more popular as ever, even as the array of devices used to consume it and the possible environments it gets consumed in appears unfathomable.

NEXT GEN TV
The impending rollout of ATSC 3.0 promises to bring together broadcast and streaming content in ways we haven’t experienced up to now, with the ability of immersive and customizable audio to be delivered to practically every device, and to be enjoyed through speakers or headphones. Unfortunately, not all streaming content is quite ready to become a part of the broadcast ecosystem, either due to poor quality or because it doesn’t meet broadcast standards.

One area where this is apparent is with streaming audio loudness, since the disparate streaming services and online outlets use different loudness settings. The good news is that most outlets now implement some sort of loudness management for content; the bad news for content creators is that this is usually done through automated normalization rather than through monitored gain modification, so audio content could be negatively altered.

AGOTTVS Streaming Audio Loudness Recommendations

The first North American attempt at setting loudness standards for streaming content was done by the AES Study Group on Streaming Loudness, headed by recording engineer Bob Katz, who wanted to set loudness standards for music streaming.

In October of 2015, the group released the document, “Recommendations for Loudness of Audio Streaming and Network File Playback,” which recommended that target loudness of non-metadata encoded music reside between –16 and –20 LUFS. While this was a great start, it was immediately realized that all other streaming audio content would benefit from similar loudness recommendations, especially the audio accompanying streaming video.

So, in early 2016 the AES Audio Guidelines for Over the Top Television and Video Streaming (AGOTTVS) Study Group was formed, chaired by Jim Starzynski, the director and principal audio engineer at NBC Universal. His previous foray into loudness management was as part of the ATSC committee that created A/85, which serves as the core of the CALM Act and regulates broadcast audio loudness.

The group is made up of members from broadcast and cable networks, content providers, device manufacturers and content delivery services, as well as other interested parties, all working together to gain agreement on standards across the entire content chain, from creation to delivery to the consumer.

Preliminary AGOTTVS guidelines were published in September 2016, and on October 19, 2017, the AES announced an update to those guidelines that appear to be on their way to becoming an AES recommended practice. AES Technical Document AESTD1006.1.17-10, titled, “Loudness Guidelines for OTT and OVD Content,” can be found at AES.org and it is worth downloading and reading all the way through.

The document’s objectives are spelled out clearly in section 2, namely to “Provide consistent loudness across different Programs, provide appropriate loudness range for devices and listening conditions, prevent … processing from degrading audio quality, preserve the artistic intent, and improve the listening experience.”

Basic recommendations are along the same lines as A/85: use ITU-R BS.1770 measurement tools and make use of metadata with content whenever possible to take advantage of available dynamic range control and loudness management.

LONG FORM VERSUS SHORT FORM
For North America, loudness measurements of long-form content should be done by measuring the anchor element, typically dialog, while full program measurements are recommended for short-form content. For long-form content where the anchor element cannot be determined, it is recommended that the full program be measured.

The guidelines themselves have recommendations for six possible content scenarios: delivery and distribution of properly prepared content; systems with questionable metadata capabilities; devices or environments with limited dynamic range; and for content created with no regard for standards or recommendations. Quite a few of the guidelines mention prior arrangements, which have been made between the content creator and distributor, and in those situations the document recommends that loudness and true peak values should be measured and should not exceed recommended regional targets.

Specific targets can be found in Annex B and vary according to region. Loudness targets for North America are –24 LKFS, plus or minus 2 dB, with maximum peaks no higher than –2 dB TP. Maximum loudness is never to exceed –16 LKFS, plus or minus 1 dB, with peaks measuring no higher than –1 dB TP.

While these recommendations should help moderate loudness of television and video content, the document notes that the loudness of sounds generated in the system itself, such as phone and tablet alerts, may differ from content loudness and requests that developers work to match loudness of all sounds across the device in future versions of software, though emergency alerts are expected to be louder than other content due to their nature.

Also noted is the fact that some devices utilize codecs, which may not support metadata, resulting in undesired playback loudness from the device and suggests that they follow CTA CEB11 and EBU Tech 3344 recommendations to mitigate this.

Finally, the document goes into the differences between fixed and portable devices, their environments, and typical expected capabilities.

This document is a major step forward in sorting out the loudness issues between different streaming services and the many devices on which their content is consumed. While AGOTTVS has a wide cross-section of industry participation, it’s not everyone. Some are waiting on a standard to be created before implementing changes, and while this document will only be a recommended practice, so is A/85.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.

Discerning Sound Defines Audio Pros

Jay Yeary — Tue, 21 Nov 2017 13:50:00 +0000

For the past 13 years, on a Saturday in mid-autumn, the AES Atlanta Section has held their workshop for students. The classes are taught by audio professionals and are held in working audio production rooms.

The workshop concluded with a discussion featuring all of the instructors, giving the students a final opportunity to ask questions.

This year’s event took place Sept. 23 and I was honored to be among the instructors, teaching the “Critical Listening” class alongside composer Tanya Ostrovsky. This class has historically been taught by multiple Grammy Award-winning engineer Jim Anderson, so we had some rather large shoes to fill.

Joining us as instructors this year were Michael Cardillo (Creative Waves) teaching “Pro Tools Fundamentals,” Miles Walker (Gwen Stefani, Coldplay) covering “Mixing Fundamentals” and Matt Still (Elton John) and Bob Gillespie (Turner Studios) digging into “Fundamental Microphone Techniques.”

The students rotated through the four classes throughout the day, ending up with a broad cross-section of audio instruction for six hours.

CRITICAL LISTENING
Critical listening is one of those intriguing topics you see mentioned every so often, but the ability to discern sounds is one of the defining characteristics of professional audio engineers. It is a skill that can and should be developed by everyone planning to work in the audio field.

The idea of listening critically as a discipline is something I find compelling because we sometimes get so wrapped up in the technical aspects of our jobs that we forget to slow down and really focus on what we’re hearing before making adjustments.

Needless to say, being a part of this class, which forced both the students and the instructor to simply listen and analyze, was utter joy.

My goals for the workshop were to get the students to really think about how to listen; get them listening critically with fewer assumptions; and to, hopefully, get them to start listening to sounds in the real world without constantly having transducers stuck into their ear canals.

I sometimes wonder if the sounds of nature, rural communities and even cities will one day become unrecognizable as we continue consuming audio content primarily through the devices we tote along in our pockets.

Tanya and I took turns teaching the Critical Listening class, handling two sessions each. My classes were loosely based on the book, “Critical Listening and Audio Production” by Jason Corey and were taught from my perspective as an engineer. Tanya’s sessions were taught from the viewpoint of a composer, with much heavier emphasis on music theory, with materials and listening tests from the “Golden Ears” course by former TV Technology audio columnist Dave Moulton.

Watching speech on the spectrogram was particularly interesting because it showed how wide the frequency and tonal range of the human voice truly is.

Because music creation and sound engineering are so intricately intertwined—and to aid in pinpointing and understanding the overall concept of frequencies—students were first given diagrams showing the frequency ranges of musical instruments and the human voice, along with the 10 octaves of hearing.

This led to a discussion of how instrument frequency ranges overlap, how timbre, pitch and resonance are among the characteristics that help us distinguish instruments and other sounds. From there we discussed the benefit of arranging music and soundscapes so that sounds with similar tonal characteristics don’t compete with each other in the mix because it tends to make the mixing process more difficult and the results often less pleasing.

In Corey’s book, he calls for active and analytical listening on the part of audio engineers and that is what we attempted to develop through the playback of a range of audio material, including music, speech and sound effects. Each example was selected for its distinctive instrumentation, to demonstrate a specific audio focus such as dynamic range, distortion, noise, reverberation and rooms, etc., and to generate discussion about what we were hearing. The goal was to recognize the unique characteristics of each example and to distinguish what certain sounds actually were.

UNDERSTANDING AND VISUALIZING FREQUENCIES
To help the students understand and visualize frequencies, all examples were presented through a spectrogram plug-in, which allowed everyone to see the frequency and intensity of sounds as they occurred. Watching speech on the spectrogram was particularly interesting because it became obvious how surprisingly wide the frequency and tonal range of the human voice truly is.

The one problem with the spectrogram plug-in was that I periodically had to remind students (and myself) that closing our eyes allows us to listen more intently. Once we worked our way through the audio elements, we started making systematic sweeps, then boosts and cuts, using a parametric equalizer to hear how sounds change when adjusted at given frequencies. This was in preparation for the final part of the class, which was a simple boost and cut equalization test using Train Your Ears EQ Edition software and a couple of new audio elements.

Some of the students attending the workshop were already doing audio production to varying degrees. Others were intrigued by the idea of working in audio as a career, and some didn’t know much about it, but were there to learn more.

By the end of the day, after attending all four classes, they had been inundated with a lot of practical, useful knowledge and had been given the opportunity to spend time with some incredible audio talent.

This workshop always concludes with a round table discussion featuring all of the instructors, giving the students one final opportunity to ask questions of the audio professionals they’ve spent their day with. Invariably, those questions include how to get hired into the business; how the instructors got into the business; and how they became successful.

Yet there are two other, far more important things, that have remained constant in all the years I’ve participated in the workshop; the curiosity that eventually rises to the surface once students understand that their questions are being answered openly and honestly and the palpable excitement they exhibit after they realize they actually could spend their lives making music, manipulating audio and staying close to their passion.

It always reminds me of how I felt in my early years in audio and also makes me wonder whether those of us working in this business realize how truly fortunate we are.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE and TAB. He can be contacted through TV Technology magazine or at transientaudiolabs.com.

The ‘Dematerialization’ of Audio Facilities

Jay Yeary — Mon, 30 Oct 2017 11:10:00 +0000

Recently I ran across an interesting piece from the Joint Taskforce for New Media (JT-NM) on the “dematerialized facility,” which envisions broadcast facilities built entirely from commodity IT equipment or with everything outsourced and no onsite equipment at all. Prior to the widespread availability of high-speed internet, the prospect of a disembodied broadcast facility would have seemed preposterous, but that is no longer the case and it is a very real possibility that broadcast facilities as we know them now will become relics of the past.

Dematerialization is essentially what happened to the large majority of recording studios and post houses—they no longer exist in the forms they possessed in the past, with recording shifting to home studios and post work being outsourced to freelancers working from personal facilities. It is unlikely that broadcast and cable network master control chains will move to homes, but it is absolutely conceivable that the technical infrastructure and content delivery chains could be outsourced.

Is this the boring future of all broadcast facilities?

A TALE OF TWO STUDIOS
The audio industry is already, to an extent, living in the dematerialized world. The advantages of minimal infrastructure were brought home to me during a just-completed studio move where there were two very different audio studios to uproot and relocate to a different building about a mile away. As usual, the move was done under tight time constraints due to scheduled sessions bookending the move. Fortunately, only one studio needed to be online for the sessions since the other had been relegated as backup in the event the primary studio was down.

To accommodate the booked sessions meant I had around 20 hours to dismantle, move, reassemble, and test at least one studio. The deadline was met and sessions happened as planned, but only because of the design of the studio I chose to bring up first—one that is in line with the vision of a dematerialized facility.

The other, more complicated studio is one a friend and I designed and assembled over 20 years ago. It is equipped the way small broadcast audio studios used to be built—with a standard mixing console, patch bays, audio meters, a limiter/compressor, an amplifier for unpowered main speakers, telephone hybrid, and an ISDN codec. Its original implementation also included reel-to-reel, cassette, and DAT recorders, but these were removed as their technologies were supplanted by hard disk-based systems.

The second, newer studio was built by another engineer, is much simpler, and takes advantage of advances in technology in the two decades since the first was built. At the center of this studio is a computer running recording software, a small USB audio interface, a small audio mixer for signal routing, a high-quality microphone preamp, balance boxes for level matching, powered speakers, and a telephone hybrid. The ISDN codec in this studio has been replaced by Source-Connect Pro, with third party bridging for the rare ISDN session that does occur. Both studios have talkback and headphone distribution to a portable recording booth, and the older studio has custom-built telephone and ISDN control, so session connections can be managed remotely without leaving the booth.

Using the portable recording booth allowed us to set up the recording space in an untreated room without making any physical changes to the room itself. It also meant we could get that space ready before either of the studio control areas were assembled. The simplified design of the newer studio enabled quick equipment placement and setup of the work area. Once the equipment and external lines were connected, all that was required was to check device and software configurations, make test recordings, and check the phone patch and Source-Connect connections in the control room and voice booth.

Live event production will always need some sort of fast user interface for mixing and routing, but it may be some time before mix engineers are willing to give up tactile faders.

ROOM FOR IMPROVEMENT
Once testing was complete, the studio was ready for sessions again—and they started almost immediately, just 10 hours from the time the booth and studios were disassembled for relocation. The older studio did not come together quite as quickly. The physical location of the equipment in this studio in relation to the voice booth and the main studio meant that it was necessary to change the racked placement of some equipment and remake some cables. In fact, the sheer number of cables in the older studio were the primary reason it took almost 16 hours to rewire and test it after reassembly of the racks.

Even though the newer studio was much quicker to reassemble and test, it could be made even simpler. The audio mixer could be replaced with a slightly larger, more capable, audio interface which could also handle record feeds, talkback, phone patch, Source-Connect routing, and headphone feeds. The telephone hybrid could be replaced by a VoIP box or software. With Audio-over-IP interfaces now widely available, duplicate recording rigs could be built easily and relatively inexpensively, and could be tied together and extended far more easily, with far fewer cables than traditional recording studios and even some newer computer based systems.

When we ponder the prospect of dematerialized facilities it is important to remember that that future is one we’re already living in. Computers have been transitioning to virtual machines for years. Portable recording booths and moveable acoustic panels not only mean quicker recording space setup, but also less physical modification of rooms. Using off-the-shelf computers, drives, switches, and other IT-centric components means easy replacement and upgrades, and using cloud-based delivery platforms like WeTransfer, DropBox, and Gobbler eliminates the transfer of physical media. Internal routing within interfaces and other devices negates the need for unnecessary external wiring. Replacing outboard gear with plugin emulations not only saves rack space and power, but also eliminates conversion delays in and out of the system.

Live event production will always need some sort of fast user interface for mixing and routing, but it may be some time before mix engineers are willing to give up tactile faders. Future broadcast facilities may be some hybrid of an outsourced, virtualized, IT-based infrastructure, but it certainly feels that, to some degree, audio technology has paved the way.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.

Working With Fixed Monitor Levels

Jay Yeary — Wed, 20 Sep 2017 11:10:00 +0000

ATSC Recommended Practice (RP) A/85 is usually thought of as a loudness standardization document full of guidelines we must follow to ensure content loudness complies with the CALM Act. In fact, the document goes beyond loudness to try and standardize the process of creating and delivering audio content for television, similar to standards used in audio for cinema. The process of mixing audio for TV has always been a bit freeform, partly because the lead time on most television productions is short and partly because the immediacy of live television makes the entire process chaotic, so successful mix engineers end up developing their own practices and processes.

One of the key areas A/85 seeks to standardize is control room speaker monitoring levels, to help engineers inherently understand their mix levels without staring at meters. To understand how well the process works I’ve been experimenting with, and attempting to hold to, fixed monitoring levels on everything listened to in the studio, whether on speakers or through headphones. This requires volume controls to be fixed in place and not adjusted when audio is too soft or too loud, a habit which has turned out to be hard to break. When the level of any audio source isn’t correct—whether audio from a studio project, music I’m listening to while working on something else, or content from the internet—it gets adjusted at the source.

After doing this for just a couple of weeks it feels like I now have an immediate grasp of whether levels are correct without looking at the meters, so it appears that fixed monitor settings do help mix engineers gain a better understanding of the audio levels within their system.

CALIBRATION
Properly setting up fixed monitor levels means calibrating your speakers and headphones, a process detailed in A/85, though it first covers the importance of correct speaker locations and recommends room equalization to help control modes.

Reference-level calibration begins in section 10.4 with a quick reference guide for the process in Annex D (Fig. 1). Recommended sound pressure levels (SPL) vary, based on one of five control rooms types, with Category One reserved for large main audio mix rooms, all the way down to headphones in Category Five. When the document was written, headphone settings were targeted at noisy environments like machine rooms and QC stations, but with the explosion of personal audio devices the headphone mix may become as important as the mix coming from the speakers. Items needed to calibrate the system include an SPL meter with C weighting, audio system metering, and audio files which can be downloaded from the links provided in section 10.4.

Fig. 1: ATSC A/85 Rooms and Reference SPL

Once meters and audio files are ready, calibration is done by following the steps outlined in the document. It is important to adhere to the SPL recommendations for each size room since the same SPL will be perceived differently in rooms of differing sizes, and to ensure that mix engineers feel comfortable with the calibrated levels in each room. Once level calibration is complete make note of and mark gain settings so the room can be returned to its calibrated settings if they are inadvertently changed.

Fig. 2: The G.R.A.S. “artificial ear” is designed to accurately test headphones, microphones telephones and loudspeakers.

USING HEADPHONES
Those who want to perform a quick calibration should look to A/85 Annex D.2, which is titled, “Really Quick Reference Guide for Monitor Setup.” My experiment began in an even simpler, less scientific way—by playing audio files through the system, ensuring that they measured –24 LKFS on a loudness meter, then adjusting monitors and headphones until the audio level was comfortable and not loud, since audio with any dynamics will contain louder segments. Measuring the SPL from speakers is relatively straightforward but determining how to accurately measure the SPL of headphones is a bit more problematic. A/85 doesn’t really clear it up by stating that we should, “use 2 cc coupler and set 440 Hz level to 74 dB.” It turns out that these audio couplers are designed for measuring hearing aids, but getting them to work with headphones looks a bit awkward and they seem to work best when coupled with a device called an “ear simulator,” (Fig. 2). A homemade alternative might be to rig a dummy head and SPL meter. Unfortunately, none of these are devices we tend to have sitting around the control room.

Fig. 3: NIOSH Sound Exposure Limits

One thing is certain, it is especially important to make sure that headphone levels aren’t too loud because the transducers sit close to the ears and anyone making their living in audio must protect their hearing. According to the National Institute for Occupational Safety and Health (NIOSH), we reach our maximum noise dose for the day after listening to audio at 85 dB(A) for just eight hours (Fig. 3). Every 3 dB increase in level above 85 dB(A) cuts listening time in half, so keeping headphone listening levels well under the NIOSH maximum recommendation seems like a prudent method of working.

A/85 recommended we start doing this with its release back in 2009 so it feels a bit like I’m late to the party on this one, yet most mixers I know still don’t work this way. This experiment has changed how I work in the studio and has converted me into a fixed-monitor-level believer. It doesn’t always go smoothly, especially when its necessary to grab an onscreen volume control to adjust a piece of loud content or when something is so soft it never reaches the correct level, but those tend to be the exceptions rather than the rule and they are easily dealt with in the studio.

The most surprising revelation of this experiment is that I’m experiencing and enjoying dynamics more because I have a better feel for the target. If you’re like I was, someone who reached for the monitor volume control whenever it was convenient, I encourage you to give fixed monitor levels a try. You’ll find yourself enjoying listening more and looking at meters only as an occasional reference, exactly as we’re supposed to be doing.

The Effects of New Formats on Loudness

Jay Yeary — Fri, 21 Jul 2017 10:40:00 +0000

The face of television audio has developed some interesting new wrinkles since we last looked at loudness—namely, immersive audio, object-based audio and audio for personal devices. We’re just beginning to unwrap the specifics and demands of these new delivery formats as well as learning how to produce for them, but loudness is an important factor of each.

The best news on the loudness front is that measurements are still based on the ITU-R BS.1770 standards we’re already using. Existing loudness meters remain valid, though some updates will likely be required. First up, we’ll take a fresh look at the foundations of loudness before we examine the impact the new formats will have on it.

LOUDNESS METERS
Loudness meters are neither volume unit (VU) nor peak meters, but are K-weighted meters designed to measure loudness in a manner that more closely approximates how humans hear. They display Loudness K-weighted Relative to Full Scale (LKFS), where each unit of LKFS (aka Loudness Units (LU)) is equivalent to one decibel.

Fig. 1: Loudness meter

Measurement is done by first applying a pre-filter that simulates how a spherical human head acoustically influences audio; then a second-order high pass filter is applied. Next, a mean square calculation is performed on all channels before they are summed, with each surround channel getting a 0.5 dB bump in the calculation and the LFE being dropped from it.

Gating was added in BS.1770-2 to ensure that quiet passages and silence don’t unduly influence the reading, so these gates make up the final stage of measurement. Samples are taken in overlapping 400 ms blocks, with the first gate at –70 LKFS, and the second at –10 dB relative to the first gated measurement.

THE MEASUREMENTS
As we look at the measurements audio engineers will actually encounter, it’s important to remember that all loudness measurements are made over time. Therefore, each measurement will fluctuate throughout the program and displayed values will rarely, if ever, remain static.

The Momentary (M) reading is an ungated measurement of loudness over a sliding 400 ms window and is a good representation of loudness at a given moment. Short-term (S) is also ungated and displays loudness over a three-second continuously sliding window. Integrated (I) is the gated overall loudness of content from beginning to end and is the measurement used to ensure CALM Act compliance.

NEWER MEASUREMENTS
A few less familiar measurements include Loudness Range (LRA), Peak to Loudness Ratio (PLR) and Maximum True Peak Level (TPL).

Loudness Range is the overall variation between quiet and loud sounds within a given piece of content, but it’s really a little more complicated than that. According to EBU Technical Document 3342, Loudness Range is “Based on the statistical distribution of measured loudness.” So the measurement looks at the entirety of the content and uses a combination of absolute and signal-dependent gates to ensure that individual loud or quiet moments don’t skew readings.

Peak to Loudness Ratio measures the dynamic range of a mix and is primarily used for music production, with some interesting potential as a broadcast tool for mixers who want a measurement of the dynamic range of their mix. PLR is a real-time measurement, usually applied to the entire piece of content, though it could also be used to determine the dynamic range of individual sections of audio. Higher PLR numbers indicate a mix with a wider dynamic range and values closer to zero indicate less dynamic range.

Maximum True Peak Level is simply the highest audio level measured at a given point within the content. It should not be allowed to exceed either the –2 dB TP specification in ATSC A/85 or the True Peak value of the content delivery specifications.

NEW FORMAT LOUDNESS
When it comes to measuring loudness for immersive and object-based audio formats, testing seems to indicate that the loudness of immersive mixes closely track the loudness of rendered 5.1 or 7.1 full mixes.

Fig. 2: MasterCheck Pro meter showing PLR measurement

In e-Brief 352, from the AES Berlin 2017 in May, the authors presented data showing the loudness variation between an immersive movie mix rendered into a Dolby E-AC-3 5.1 mix. When decoded into multiple Speech Gated Loudness configurations the variation was a maximum of 0.3 LU, and maximum variation of 0.7 LU when decoded and replayed into multiple Relative-gated Loudness configurations.

Additional evidence that current loudness tools will work for NGA audio formats can be found in the Atoms Production Suite manual, which states that, “You should use the 7.1 full mix re-render for loudness measurement during post production to ensure that content meets delivery specifications.”

Suggestions have surfaced for specific multichannel loudness tools for NGA formats, but if measuring surround mixes is equivalent to measuring NGA mixes, then these new loudness tools may not be necessary.

PERSONAL DEVICE LOUDNESS
The outlook is a little murkier for personal devices such as mobile phones and tablets because there’s so much variation between them. With differing acoustic and electronic outputs, the potential listening options seem almost limitless. Content services seem to recognize the need for loudness management because most have instituted normalization, though not at the same loudness level.

Fortunately, work is already in progress to address the technical needs of these devices. The AES has one study group working on loudness standards for streaming and the AGOTTVS subcommittee is working on loudness standards for OTT and video streaming. Both AES groups have published preliminary guidelines and have broad support from the broadcast industry and content services. However, the sheer variety of available devices makes the labor of both AES groups extremely involved and technically challenging.

A lot has changed in the 14 years since Dolby introduced the LM-100, the first broadcast loudness meter, yet ITU-R BS.1770 and its refinements are holding up well in the face of new audio delivery formats, proving that it was a wise choice for modern loudness management tools. We’ll likely see more refined measurement algorithms in the future, as well as additional specialized measurements, but for now it appears that the loudness tools we’re using will continue to serve us for the foreseeable future.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted throughTV Technologymagazine or attransientaudiolabs.com.

Audio Education at the 2017 NAB Show

Jay Yeary — Fri, 30 Jun 2017 09:30:00 +0000

Every April, thousands of us make a pilgrimage to the desert to learn from gurus who pass along insight and direction, give us glimpses into the future, and help us make sense of it all. We learn to discern viable technology from the vapor versions, while balancing countless miles of walking with unhealthy amounts of sitting in uncomfortable chairs.

This year more than 100,000 of us journeyed to the NAB Show in Las Vegas to learn, catch up with colleagues and gawk at the newest, shiniest equipment from vendors.

In recent years, I’ve shifted focus from the gear toward the amazing number of learning opportunities that coincide with the NAB Show. Conferences, training classes and vendor-sponsored events kick into high gear the weekend before exhibits open. For me, that weekend is consumed by Avid Connect and the DTV Audio Group meeting.

AVID CONNECT
Avid Connect has become increasingly intriguing as the company works to implement the Avid Everywhere initiative announced at the inaugural event four years ago. Attendees get a far more in-depth look at new features than is possible at the NAB Show, as well as the opportunity to interact with product development teams.

Avid announced a number of new developments for Pro Tools at Avid Connect just before the NAB Show.

A number of positive announcements were made at this year’s event. First is that, sometime in Q2 of 2017, Pro Tools will gain multi-seat access to Avid’s Nexis storage, officially enabling multiple Pro Tools systems to share projects and stream audio from the same Nexis storage pool as editorial users.

Shared network storage is often an unsettling option for Pro Tools users because a server’s network connection can get congested with traffic from all the small audio files and waveforms the workstation uses.

During a session on centralized storage, it was mentioned that the Pro Tools and Nexis teams worked together to ensure that Pro Tools waveforms are now cached in a manner that keeps them from saturating the network.

Next up was the announcement of native Atmos support in Pro Tools HD along with the release of two Dolby Atmos software packages—welcome news to anyone concerned about the lack of available tools for immersive and object-based audio production.

The Atmos Production Suite can be downloaded directly from the Avid Store and contains the tools necessary to create Atmos content in Pro Tools. The suite includes three panners, the monitor application, a VR transcoder and the Atmos renderer, along with renderer send and return plug-ins.

The basic bed track format is 7.1.2, allowing two channels for overhead/upper speakers rather than the four specified in ATSC A/342, and up to 118 audio objects. Object busses auto-fold down when there is no renderer, though busses must be configured first.

Some additional items of note came from the session on Atmos workflows. Demos were run in a pre-release version of Pro Tools 12.9, which included an Atmos peripherals tab, the ability to show Objects in the Clips menu, and a menu choice to “Duplicate Atmos Plug-in Automation to Pan Automation,” which converts existing Atmos automation into the type used in the Production Suite.

Other features include the ability to pan between mix and object busses and improved stem rerecording capabilities. These Atmos-specific items are not present in Pro Tools 12.7.1 with the Production Suite installed, so they’re likely coming in an updated version of Pro Tools, though it’s also probable the new version will differ from what was shown at Avid Connect.

The second software package, the Dolby Atmos Mastering Suite, is intended for DVD/Blu-ray mastering and consumer end-product creation and is available only from retailers. It requires either a Dolby RMU or the hardware equivalent of an RMU.

THE DTV AUDIO GROUP
The DTV Audio Group meeting is an afternoon spent gleaning important information from people who are driving change in the industry. Kicking things off this year was Thomas Edwards of Fox Networks, with an update on IP video. He showed how SMPTE IP standards are jelling into the SMPTE ST 2110 media-over-IP standard, with AES67 chosen as the PCM linear audio stream format for ST 2110-30.

The DTV Audio Group covered a range of next-gen audio developments.

Edwards noted that ST 2110 requires zero offset start times for AES67 streams and that the 2110-31 AES3 nonlinear transport is far from sorted out, so input from professional audio organizations is being sought. He concluded with a discussion of discovery protocols, noting that NMOS IS-04 would likely be the discovery and registration protocol used in ST 2110.

Next up was Kevin Gross of AVA Networks with an update on AES67, describing how it fits into the family tree of professional open standards. Responding to Thomas’ earlier comments, he pointed out that AES67 does not require zero offset streams because random offsets are more secure, but that in fact, many AES67 devices already use zero offsets. So compliance for ST 2110 is certainly doable.

Kent Terry from Dolby Labs delved into “Audio Metadata in Live IP Streams,” specifically addressing ST 2110-31 and offering KLV metadata as a potential solution.

Results and implications of the wireless spectrum auction were covered by Jackie Green, president of Alteros, who stated that devices using the 600 MHz frequency band must go quiet soon. Theoretically, they may be used for 39 months after the auction close date, but in reality all usage must cease as soon as the auction winner begins testing devices.

Rob France from Dolby explained Atmos production techniques used for the National Premier Soccer League, including creating 5.1.4 beds using Soundfield microphones. A progress update on AES audio guidelines for OTT was led by Jim Starzynski of NBC, followed by a recap of audio production at the Rio Olympics and a preview of the AoIP-heavy audio design for the forthcoming Pyeongchang games from Karl Malone, also of NBC. The meeting wrapped with a lively panel discussion on Next Generation Audio production.

Between the Pro Tools Atmos software announcements and news of the upcoming SMPTE ST 2110 media-over-IP standard with its inclusion of AES67, the weekend was, as usual, both informative and time well spent.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE and TAB. He can be contacted through TV Technology magazine or attransientaudiolabs.com.

ATSC 3.0 Audio: Structure and Metadata

Jay Yeary — Tue, 25 Apr 2017 14:30:00 +0000

Think about how we consume television now. In our house, cable is simply the pipe that delivers high-speed internet, and content arrives via a combination of over-the-air antenna and a handful of streaming services. We rely on smart devices to locate and play what we watch because we no longer know when it airs. Into this world of distributed, multisourced content our nextgeneration television delivery system is being born, a world where broadcast runs the risk of becoming an afterthought, at least until the internet service goes down.

Both the promise and challenge of ATSC 3.0 is to merge broadcasting into this internet- centric media landscape and, in the process, deliver high-quality content with properly delivered mixes to every device. Accomplishing this requires changing virtually every technology currently in use that delivers television to the home and brings with it new structure and expanded metadata.

Information about the audio elements of ATSC 3.0 can be found in three documents that make up the finalized ATSC A/342 standard. Part one covers the elements common to all Next Generation Audio systems, part two specifies AC-4, and MPEG-H is detailed in part three. These documents contain virtually all the currently available information regarding what audio engineers will be dealing with in ATSC 3.0. In this column, we’ll look at the structure of the common elements as they relate to AC-4, some of the new metadata parameters, and briefly touch on a few things that remain unanswered.

Fig. 1: In the Next-Generation Audio world, the ATSC A/342 standard treats audio as data.NGA STRUCTURE
Let’s start by looking at the audio data structure of NGA systems. Currently, when prepping a mix for television, audio is the source and audio is the final output, whether it’s a stem, submix, iso, or full mix. In the NGA world, audio is treated more like data, and in fact that’s how A/342 refers to it, as audio data (see Fig. 1). Source audio is now officially referred to as “Audio Signals” and each signal may or may not have metadata associated with it. The Audio Signals and their metadata are formatted into either Channel-Based, Object-Based, or Scene-Based sets called Audio Elements.

Channel-Based Audio Elements are traditional-style, fixed output mixes that can be anything from single-channel mono to immersive 7.1+4. Object-based elements consist of Audio Objects with positional metadata that allow them to be placed—statically or dynamically—almost anywhere in the sound field. Scene-based elements model an actual or simulated sound field.

Audio Elements also carry positional or personalization metadata as well as rendering information that helps the system format audio data for the final receiving device. Groups of Audio Elements make up Audio Program Components which may consist of Complete Main mixes, Music and Effects submixes, Dialog only submixes, Video Description Services, and other audio feeds and mixes which are delivered in one or multiple audio elementary streams.

Finally, single or combinations of Audio Program Components constitute an Audio Presentation, which is the audio portion of an ATSC 3.0 program. There can be more than one Audio Presentation per program but one must be designated as the default presentation. By design this layered structure provides the flexibility necessary to deliver immersive audio, personalized content, and emergency alert information to the listener.

NEW METADATA PARAMETERS
There are several new metadata parameters in addition to extensions of current ones. Control parameters for dialog now include Dialog Enhancement to give users more control of independent and premixed dialog to help improve intelligibility of content after it reaches the home.

Dynamic Range Control (DRC) keeps existing E-AC-3 elementary modes but adds modes that properly render audio to the seemingly limitless variety of target devices such as portable audio systems, flat screen televisions, and home theaters.

Loudness metadata has been expanded to include parameters for true peak and maximum true peak, relative gated loudness and speech gated loudness, dialog gating type, momentary and maximum momentary loudness, and short-term and maximum short-term loudness.

Intelligent Loudness Management (metadata) in AC-4 means the system can now verify whether associated metadata matches audio content and then pass along to decoding devices that the loudness metadata is correct so no changes are made to the final audio. If the system is unable to validate loudness metadata, a real-time loudness leveler can be enabled to ensure loudness standards are met.

Extensible metadata is supported in AC-4, allowing user data, third-party metadata, and application data to also be carried in the bit stream while alternative metadata can be associated with objects, in addition to their regular metadata, to enable presentations to create different versions of the object for their use.

CONTENT DELIVERY
There are two types of presentation streams and two types of decoding modes available in AC-4:

An Advanced Single-Stream presentation enables a single stream to carry multiple Audio Program Components inside it, whether those components are part of the same presentation or from multiple presentations. All mixes, submixes, and versions of a presentation may be carried in the single stream or the stream may be used to carry several different, possibly even unrelated, programs.

Multi-Stream Hybrid presentations send Audio Program Components over multiple paths, with the primary component delivered via broadcast and other components delivered via a secondary path of some sort, of which broadband is an option. Core Decoding delivers a simple, complete audio presentation to target devices—such as phones and televisions—that have simple playback capabilities. Full Decoding mode delivers complete presentations to devices like home theaters that have more complex decoding capabilities.

Fig. 2: This graphic represents one possible option for a 7.14 control room speaker setup. Actual control room speaker layouts have not been specified by the ATSC yet. The documents that make up A/342 give us an amazing amount of information, far beyond what is presented here, but there are still gaps. For instance, proper placement of speakers for 7.1+4 setups for television audio mix rooms remains unspecified and the entire rendering process needs to be clarified (Fig. 2). Fortunately, there are training initiatives taking shape from manufacturers and professional organizations geared to help mixers and technical staff get a handle on NGA and prepare for rollout, which may make this the most challenging and exciting time to be a television mixer since the switch to digital.

ATSC 3.0 and Preparing Audio Infrastructure

Jay Yeary — Fri, 24 Feb 2017 13:50:00 +0000

Fig. 1: Simplified ATSC 3.0 layer model

IP transmission will begin making inroads into North American broadcast facilities starting this spring, even if they have no plans to implement an IP-based infrastructure, thanks to the finalization of ATSC 3.0 specifications and the introduction of compatible consumer products at this year’s CES. ATSC 3.0 is an IP-based transmission standard designed on a five-layer stack (see Fig. 1) akin to the seven-layer OSI network stack that allows easy technology replacement and substitution. This month we’re looking at what needs to be done to prepare existing infrastructure to handle the Next Generation Audio (NGA) formats coming in ATSC 3.0.

The audio system chosen for North American implementations of ATSC 3.0 is Dolby AC-4, which has three Audio Element Formats: channel-based, object-based and scene-based audio. Channel-based audio is essentially what we have now with mono, stereo and surround formats, though with the addition of height channels it also serves as the base for immersive audio mixes. Audio objects consist of audio signals and positional metadata for use in immersive mixes or for audio program customization.

Scene-based audio is a sort of soundfield snapshot from a high-order ambisonic source. All audio components get rendered, then encapsulated, along with video, into an HEVC H.265 stream for broadcast and a synchronized MPEG-DASH stream for broadband. The system is designed so that delivered audio can be played back anywhere, from home theaters to handheld mobile devices to headphones, because decoders adjust playback parameters for the end user’s speaker configurations and devices. This end device rendering may finally mean the end of downmixing.

Controlling all of this and making it function properly requires lots of metadata, so those of us who designed our infrastructure around static metadata will have to rethink and likely rework it.

Some things about the new system are familiar, though with updates. The system sample rate is 48 kHz, with support added for 92 and 192 kHz. Dynamic Range Control remains in the system and loudness management is still LKFS-based. Welcome additions to loudness management include a feature that verifies whether metadata parameters and content measurements match, and there is now an optional leveler.

IMMERSIVE AUDIO
Immersive audio in AC-4 begins with 12 audio channels in a 7.1+4 configuration with speaker locations designated as Left, Center, Right, Left Side and Rear, Right Side and Rear, LFE, Upper Left and Upper Rear Surround, and Upper Right and Upper Rear Surround. Upper channel speakers in this format are placed above their lower channel counterparts to provide height imaging.

Setting up a 5.1 mix room for this configuration requires the addition of six more speakers and will likely require replacement or modification of the current monitor controller. Outfitting a dedicated production control room for 7.1+4 is certainly possible, but doing so in a mobile unit raises a host of concerns including whether any ambience in the height speakers will be distinguishable from the ambience bleeding through the truck walls.

More interesting than immersion, and possibly more challenging, are the uses and potential configurations of object-based audio and customization. One of the most discussed uses for customization is modification of the dialog track to allow swapping of the primary dialog track with one or another language, listening to a secondary commentary track, or turning the dialog level up, down or off.

Other uses include providing assistive services such as descriptive video or audio versions of emergency notifications. Audio objects are also meant to be primary and secondary components in immersive audio mixes. Providing this array of options means being able to create enough mix minuses, submixes and stems, and have enough paths to move them through the facility.

Fig. 2: ATSC 3.0 example broadcast operating profiles

ASSESSING OUR FACILITIES
A look at some suggested ATSC example broadcast operating profiles gives us an idea of the feeds required to provide these services to consumers (see Fig. 2).A 15.1 channel M&E with two dialog tracks and one video descriptive service tosses 19 input elements at the audio encoder just for one stream, 11 more than a 5.1+2 mix.

ATSC 3.0 allows multiple simultaneous streams, so the element count could get quite high, depending on what profile is used for each stream.

This information helps us assess whether current production facility technologies will handle the workload, and, in fact, most remain useful to some degree. All infrastructure paths still seem valid with AES the most limited and embedded SDI good for feeds up to 16 channels wide. MADI, with 56–64 channels per link, should be sufficient for most current broadcast productions, but moving beyond 64 channels in one path means moving to some form of AoIP.

Speakers may need to be added and metering will need to be updated for immersive mixing and monitoring, not just in the audio mix rooms, but also in editorial rooms and QC stations. QC audio monitoring may be awkward given the potential output options available, so it may be the time to consider building a separate audio QC room.

Monitor controllers will need to be assessed to see if they can handle the mix and submix options, and room size and room acoustics should be reassessed once speakers are added for immersion. Digital audio consoles with internal routers may be the Swiss Army knife for solving signal flow and feed issues for production since they can be outfitted to directly interface with most types of facility I/O, including AoIP, though additional DSP processing and I/O may need to be added to the console system, and the monitor section will need to be updated to monitor objects and immersive mixes.

The most challenging part to making all of this work may be proper metadata authoring, control and QC. AC-4 is highly metadata-dependent and it seems unlikely we’ll simply be able to provide a static metadata product to the consumer, certainly not if other broadcasters deliver the value-added product this new system can deliver.

Of course, there is currently no pressure for any facility to make drastic or wholesale changes right away or move to full immersive audio with multiple languages out of the gate, but now is certainly the time to start planning.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology or attransientaudiolabs.com.

Lessons From the 2016 AES Convention

Jay Yeary — Mon, 26 Dec 2016 09:00:00 +0000

At the 2016 Audio Engineering Society Convention in Los Angeles, I ran into a friend who was manning the booth of a broadcast audio console manufacturer. I had finally managed to get to the show floor for a quick walkthrough of the exhibits before heading back to another meeting room. He expressed consternation about why they were there since it wasn’t a broadcast show and few attendees would be looking for a high-quality digital audio console.

Along with seeing demos of new products, the AES Convention has become an opportunity to dive into the technology of what we do and see how it is being used. This struck me as odd since there were many years where that was exactly why I went to the show. Manufacturers of larger audio products now tend to stick to broadcast- focused shows since recording and post engineers have gravitated to working inside the box with emulations of hardware, including features from audio consoles.

As I pondered the implication of my friend’s words—that the convention dedicated solely to audio is no longer the place for audio consoles with amazing sound and incredible technology—I realized that the dynamic of the convention has changed for me as well. It’s no longer primarily about getting to play with fantastic new audio products, but it has become an opportunity to dive into the technology of what we do and see how it is being used.

AUDIO FOR VR

Merely judging the show by exhibit space alone gives an incorrect impression of what is going on at the show because the technical program of workshops, papers, tutorials and technical tours were incredibly busy. In addition, the AES Audio for Virtual and Augmented Reality (AVAR) technical conference was held in conjunction with the convention this year, which meant some of us were standing around with boxy, modern day Viewmasters on our faces.

VR and immersive audio seem to be exciting people and offering opportunities in television, film and gaming that may not have previously been on their radar. VR, while an interesting technology, is not one I’ve seen much practical value in until this show when I ran across a small company called Audio Fusion who deservedly won the AES Silver Award in the student design competition.

Audio Fusion has created a virtual studio training environment by modeling an analog recording studio to provide hands-on training for those who don’t have access to an actual studio. Trainees use headsets, headphones and custom controllers to manipulate audio consoles, patchbays and other studio equipment.

NEXT-GENERATION AUDIO

The status and production of immersive and object-based audio, now collectively referred to as next-generation audio (NGA), was discussed in several sessions.

Dolby AC-4 has been selected as the next-generation audio format for the United States, while MPEG-H will be used in the country with the most aggressive timeline for ATSC 3.0 rollout, South Korea. This is the first broadcast format designed to deliver broadcast content to devices of all types, not just to televisions, including new devices as they arrive on the scene.

Mobile devices in particular bring with them an array of level and dynamic range problems due to the nearly limitless number of viewing environments they could be used in. AC-4 will manage these mobile environments as well as static listening environments, by utilizing multiple metadata profiles and rendering audio at each device. MPEG-H proponents remain skeptical as to whether the metadata in AC-4 can survive the distribution process despite the benefits of audio control and interface customization.

Audio Fusion has created a virtual studio training environment by modeling an analog recording studio to provide hands-on training.AUDIO FOR OTT

At last year’s convention, the AES Technical Committee on Transmission and Broadcasting published their technical document for audio music streaming and this year their subcommittee, AGOTTVS, has released technical document AES TD1005.1.16-09, which covers loudness for OTT streaming. “Audio Guidelines for Over-the-Top Television and Video Streaming” provides more than the lengthy acronym for the subcommittee; it also provides initial loudness recommendations for a problem-fraught delivery medium.

There are four recommended practices in the document: the use of agile or static metadata when devices are full-range and distribution is able to support it; a list of how to handle content when the distribution system does not have metadata capabilities along with a recommended loudness setting of –16 LKFS for devices with limited dynamic range; the recommendation that all loudness implementations be tested for anomalies; and recommendations for versioning of the same material with metadata encoded versions left at full range and reducing the dynamic range of versions without metadata.

AGOTTVS is made up of broadcasters, manufacturers and streaming companies, and is headed by NBC’s Jim Starzynski, who helped forge the A/85 loudness recommendations that form the core of the CALM Act. It is a testament to this group that they have been successful in engaging some of the streaming providers in the standards process. Hopefully all of them will get involved as the group’s work develops into a standard.

AUDIO-OVER-IP

Sessions on AES67 drilled down into this maturing technology, covering some large deployments of it in real-world live events. One important point that was brought up, and one I’m not sure I’ve stressed enough, is that AES67 is not a competitor to other AoIP technologies, but is meant to help them all work together.

Video is certainly a big part of the broadcast world, but it was not originally mentioned in AES67 despite the possibility that it could be included later. AES67 has now been adopted by the Joint Taskforce on Networked Media (JTNM) in its Video Services Forum (VSF) technical recommendations TR- 03 and TR-04 for inclusion into the upcoming SMPTE ST-2110 standard. This means that manufacturers adhering to AES67 technical recommendations now have access to the television market as long as they also adhere to the Networked Media Open Specifications (NMOS) from the AMWA.

There were many other interesting sessions this year, and we’ve only scratched the surface of the few covered here, but we’ve run out of space. December is traditionally a time for gift giving and making resolutions, so I encourage you to consider giving the gift of education in some form this year because it truly is the gift that keeps on giving. Keep on learning!

The Difference Between Audio for Film, TV

Jay Yeary — Mon, 28 Nov 2016 10:40:00 +0000

The entire process of sound for film has always fascinated me partly because the working environment seems so extravagant compared to someone who has spent the majority of their professional life working in the trenches of broadcast audio for television.

The mystique perpetrated by cinema audio is that the budgets are enormous, audio is recorded on gigantic sound stages, mixed in huge mix rooms and staffed by large specialized crews; and for studio-backed widerelease films, these generalizations are fairly accurate.

Broadcast audio engineers have a totally different working experience. In broadcast audio post, a single person may end up handling the music and dialog editing, sound design and final mix; whereas in film each of these is handled by a different person. Some broadcast facilities have relatively large studios, but the budgets and crews are much smaller than those in cinema, and audio control rooms are sometimes no larger than the closet we’re able to wrest away from the video guys.

A Sony mix stageFRESH LOOK

In September of this year TV Technology sister publication Mix magazine held the third annual “Mix Presents Sound for Film and Television” event on the Sony Entertainment lot in Culver City, Calif., and it seemed like an excellent opportunity to get a fresh look at the world of cinema audio. I was particularly interested in seeing how much immersive audio work was being done and what impact the proliferation of OTT shows is having.

As soon as I arrived on the studio lot it was apparent that production for streaming has made its way to Hollywood because billboards for OTT shows are prominently displayed alongside those for broadcast networks and feature films.

The event itself made for a very full day, beginning with the keynote from Gary Bourgeois at 9:30 a.m. and finishing up 12 hours later after the screening of 13 eight-minute film segments, each introduced by the sound crew or another person involved with the film. There was a mini-trade show, sound cart demonstrations and technology master classes, but the most compelling parts of the event were the panel discussions: “Sound Inspiration Within the Storytelling Process,” “Sound Mixing Panel: Workflow for Musicals” and “Building an Immersive Room: Small, Medium and Large.” The panelists for these sessions were excellent and the audio podcasts are now available online at http://soundworkscollection.com/.

Of particular interest was a session titled, “Remastering Cinema Audio for ATSC HDTV,” led by Lon Neumann, which was, at its heart, a review of ATSC A/85. It seems the film community is still coming to grips with the CALM Act and translating cinematic- style mixes into one that plays well in the home. Neumann introduced a couple of terms I had not encountered in reference to sound for picture: “allocentric” and “egocentric,” but they perfectly sum up the differences between cinema and television.

The Sony Entertainment lot Seeing a film in the theater means participating in an allocentric event specifically designed to be shared with others, with speakers distributed around the room for wide dispersion, where everyone in the theater shares a similar experience.

Television on the other hand—especially with the increasing consumption of content on personal media devices— is a highly egocentric medium where everything, including speakers, is focused to one spot in the room creating an experience best enjoyed by one or a few people.

Creating content for these two opposing methodologies necessarily dictates mix rooms with vastly different designs and speaker layouts in order to emulate the end-user experience, which additionally complicates the translation of material between the film and television mediums.

It turns out there is now so much television, streaming and game work being done in L.A. that mix rooms are in short supply, so new ones are being built by studios as well as by mixers who have opened their own facilities. Most of these new rooms are built primarily for television and game work so the rooms being built tend to be smaller. This presents challenges for mixers who move between large and small mix rooms because they not only have to adjust to completely different spatial environments, the rooms tend to have less than ideal dimensions so low frequency build-up must be dealt with, and the smaller rooms require mixing at lower monitor levels.

Even these smaller rooms are larger than most broadcast audio mix rooms, but film sound mixers are starting to get a taste of the issues the rest of us deal with on a daily basis.

ADDITIONAL COMPLICATIONS

The Sound Inspiration Panel, from left: Carolyn Giardina (moderator), Will Files, Mark Stoeckinger, Ben Cook, Paula Fairfield, Harry Coen, Paul Menichini

One enormous additional complication presented by these smaller mix rooms when compared to broadcast mix rooms is that the majority are built for immersive audio production. That means not just more speakers, but additional acoustic design complications. The panel on building immersive rooms stressed how important it is to make the rooms deader than they would for a standard film mix room, including deadening the ceiling surfaces, to help with localization of sound sources. Every mixer who spoke about immersive audio spoke effusively about how much they enjoy working in it and how it is becoming a regular part of their workflow for film, episodic television and gaming. The general consensus among the mixers is that it’s here to stay.

Another aspect of film sound that has changed is the jobs themselves. Thanks to the smaller budgets of today’s television shows, the sheer amount of work to be done and the workflow efficiencies gained by being able to work completely in the box, the lines are starting to blur and individuals on sound crews are more often filling multiple roles.

It was a pleasure to see how OTT and gaming has reinvigorated the film sound community, but the most encouraging thing was the large number of young people attending this event who are working in the industry—something the broadcast industry seems unable to replicate on a large scale. However, if we’re lucky, the abundance of interesting content, gaming, immersive audio and fresh faces will trickle down from studio lots to the wider broadcast community.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE and TAB. He can be contacted throughTV Technologyor attransientaudiolabs.com.

Considering Video for Audio Engineers

Jay Yeary — Thu, 03 Nov 2016 11:51:00 +0000

Anyone who has been an audio engineer for a substantial period of time does so because they love what they do, not because they can’t move on to something else. Being an audio engineer in the world of broadcast means that we also have to deal with all that pesky video equipment and the signals that accompany it.

A depth of video knowledge may actually be the broadcast audio engineer’s least-known skill since, just to get our jobs done, we often end up learning more about video than anyone realizes. Still it can be tempting to just stick to the basics of what we need to know and end up puzzled by some of the stuff we run across.

This month we’re going to take a look at some video terminology that we see, but may not be familiar with. Let’s start out with a general overview of garden variety video connections: plain old analog video, as composite and component, standard-definition video (SD), and high-definition video (HD).

Video starts out with individual red (R), green (G) and blue (B) channels, which, depending on the application, either remain individual full bandwidth signals or get matrixed into a full-bandwidth luminance channel (Y); and two lower bandwidth color difference channels (B-Y, R-Y), which minimizes loads on video processors.

An array of video connectors with composite as CVBSCOMPOSITE AND COMPONENT
Composite video combines the three channels onto one cable while component leaves them separate. Composite connections are usually just called video, though I recently ran across a device that labeled it CVBS (color, video, blanking and sync). Component connections are most often designated Y/G, Pb/B, Pr/R and may or may not be labeled component. The simplest way to think of SD and HD video is that they are 8- or 10-bit digital versions of composite video that each move down one Serial Digital Interface (SDI) cable typically terminated in BNCs, so even the cabling is similar to composite analog.

The superpower of SDI video is that it can carry 16 channels of embedded audio along with it. Native SD video is always in the 4:3 aspect ratio with an interlaced frame size of 720x480 at 270 Mbps. Native HD is in 16:9 widescreen video format and is most often seen in progressive frame sizes 1280x720 (720p) or 1920x1080 (1080p) at 1.485 Gbps.

Interlaced video places odd lines on the display first, followed immediately by the even lines, while progressive scan video displays all lines in order. As with audio, connecting digital and analog video signals together requires conversion.

Another video format that needs addressing here is the high-definition multimedia interface (HDMI) as its ubiquity has allowed it to creep out of the consumer space into professional facilities. HDMI passes high-definition video and audio between devices on one cable, but quality can be hampered by poor quality cables and transmission hindered by high-bandwidth digital content protection (HDCP).

Content-encrypted with HDCP will not display on devices without an HDCP license, or those it considers unlicensed. Unfortunately, nonsensical failures are regular occurrences due to this embedded copy protection so HDMI should be used sparingly and always be tested.

Video has, of course, moved past HD to 4K and higher resolutions, with high dynamic range the current flavor of the day. 4K, which is being marketed as “ultra-high definition,” offers a minimum of four times the pixels of HD. Actually, UHD displays reproduce 3840x2160 lines to maintain a 16:9 aspect ratio, while video capture and image creation will likely be done in the actual 4K resolution of 4092x2160 lines. Higher resolutions and bit depths require infrastructure upgrades to beyond 3 Gbps and, depending on the codec, potentially more data storage space.

Uncompressed 1080p HD at 59.95 fps consumes disk space at approximately 1.53 TB per hour, whereas uncompressed 2160p 4K video would take up approximately 5 TB per hour. These storage requirements almost certainly mean that most 4K content will be shot and stored in some compressed format.

THE PROBLEM IS...
A bigger problem is that video carrying this much data cannot be moved around within most infrastructure as it currently exists. There was talk of a new multi-cable standard for 4K early on, but, as with most other technologies in the plant, IP distribution seems to be the way forward. HDR technology, which increases contrast and expands colors available in images, is being touted as a way to improve picture quality without increasing resolution. I’ve seen HDR in cinema demonstrations and it is very impressive, with incendiary whites and seemingly bottomless blacks and tons of detail where it was previously missing.

Every once in a while I run across a device with an asynchronous serial interface (ASI) connection on the back, also called a digital video broadcasting-asynchronous serial interface (DVB-ASI). This is a one-way data link for streaming compressed video and audio between digital devices. An ASI output on one device connects to the ASI input on another and, since it is a compressed data stream, it is not compatible with SDI connections.

Finally, there are the video sampling formats 4:4:4, 4:2:2, 4:1:1 and their seemingly ceaseless variations. This is actually related to our earlier coverage of luminance and RGB. The first number in each of these trios refers to luminance, while the other two are chrominance.

For instance, a 4:4:4 signal has the luminance (Y), red minus luminance (R–Y), and blue minus luminance (B–Y) channels each sampled four times, while 4:2:2 samples luminance four times, and drops sampling of each chrominance channel to two times. Following this pattern, it is easy to figure out the other sampling formats, though if you encounter one with a fourth digit (i.e. 4:4:4:4) then a key channel has been added.

Luminance sampling must remain high because we are far more sensitive to light than we are to color, so color sampling can be reduced with little concern. You may have noticed that all matrixed RGB lacks a green channel, which is not necessary since green can be derived from information in the other luminance channels.

Jay Yeary is a broadcast engineer and consultant who specializes in audio and now wonders if video did indeed kill the radio star. He is an AES Fellow and a member of SBE, SMPTE and TAB. He can be contacted through TV Technology or attransientaudiolabs.com.

Transitioning to IP Audio Codecs

Jay Yeary — Fri, 16 Sep 2016 09:29:00 +0000

Comrex BRIC-Link II IP audio codec

In my August column, “The Audio World Without ISDN,” we looked at the impending demise of ISDN and the options available for those who can no longer get service. With telcos pushing everything into the packet-switched world, it certainly appears that any solution to this dilemma will be IP-centered.

Manufacturers of ISDN hardware and IP software obviously want us to move in that direction since they already have solutions available, and, in many instances, the hardware is designed to drop into the studio as a direct replacement for an ISDN codec.

Whether we move from ISDN to IP is no longer a question but more a matter of time, so we need to examine concerns about how to make the transition smoothly.

YOU WANT IT WHEN?

Reliability seems to be the greatest concern when it comes to using the internet to provide audio services. ISDN provides nearly rock-solid reliability with uptime the norm, not exactly how most of us would describe our experience with internet connections.

There is little we can do if the network goes offline, but choosing the most reliable provider who can deliver sufficient speed and bandwidth for our needs is the first step, though anyone in a large organization will likely have no choice in the matter.

When shopping for bandwidth, just about any modern urban internet plan will suffice because audio data is relatively small in terms of network traffic. A peek at some of the larger internet providers shows that the current slowest business-plan download speed is 10 Mbps, which is more than sufficient for receive data. However, the upload speed for that plan is just 1 Mbps, which gives no headroom for outgoing audio data if the network is performing subpar that day, so a plan with more upload speed, say 5 Mbps, makes more sense.

Anyone who wants a really robust connection has the option of leasing a dedicated network line between the provider and the studio. It is extremely important to make sure the provider doesn’t have data caps that will shut down your service if you’ve done too many sessions and exceeded your cap.

Fortunately, now that Google is expanding its fiber network, subsequently inspiring previously hesitant telcos to do the same, faster network upload and download speeds should be on the horizon for most everyone.

PACKET LOSS

A source-connect now IP audio session in progress There are a number of technologies built into IP audio codecs to ensure they are robust enough for professional use. Remember that our audio data is being broken into packets, then sent across the internet to a specific destination where they are reassembled in order, despite the fact they may have taken vastly different paths to get there. Due to the fragmented nature of the network there is a high probability that jitter will occur because packets rarely arrive in order, but this is dealt with by introducing a buffer delay prior to reassembly.

There is also the possibility, dependent on a host of factors, that some packets won’t arrive at all, resulting in packet loss. Buffering introduces latency in the path, which can be a problem for live audio sessions; and packet loss is what everyone dreads because it means some of your audio data has gone missing. In reality neither of these are significant problems in IP audio streams because all the codecs use some form of data compression to minimize demands placed on the network while they actively manage buffers to minimize delay.

Virtually all IP codecs use Forward Error Correction (FEC), which puts duplicates of packets inside adjacent packets so the data can be reproduced even if a packet, or a string of packets, is lost.

For larger organizations there can be problems using the business LAN to send data, though the problems are not nearly as prevalent with audio as with video. It is possible to have the IT department prioritize traffic using Quality of Service (QOS) tools within the router. The bigger issue when passing media data on business LANs tends to be with firewall rules that limit traffic or close ports.

These issues can usually be handled by working with the internal IT department, but be aware that sometimes updates are pushed out and device refreshes are done by IT that may not include the changes requested for audio traffic, so it’s important that they include those changes in their base firewall and router configurations.

PRACTICAL SOLUTIONS

At the practical end of the spectrum is how these things work in everyday use. While I rarely cross paths with ISDN these days, I do spend a lot of time using IP devices for television production. I hope the following example gives some idea of how solid IP audio is.

For a weekly news show, we use two Comrex BRIC-Link II bidirectional point-to-point IP audio codecs using AAC/HE-AAC coding for IFB and backup audio, as well as two Haivision Makito video and audio units passing 1920x1080i video and MPEG-4 48 kHz audio at an average bandwidth of 6400 Kbps for all traffic.

The BRIC-Link IP audio codecs have been rock solid with no loss of audio even though data passes through two networks and myriad firewalls that hide the networks from the outside world. The Makito makes good-looking video, despite the low bit-rate. We occasionally experience blocking in the image, but never have any loss of audio.

The final choice here is whether to choose a hardware or software codec, and this decision likely comes down to what the destination connection will be. Either prefer the same or similar codec at each end though there is some flexibility and bridging is always available. For studios with control over both ends of the chain, a hardware codec may make more sense. Radio and remote television broadcasts are perfect places to use hardware codecs due to their plug-in simplicity. For voice actors and smaller studios, the choice is not so clear and software codecs may offer more flexibility. Regardless, the world without ISDN is here and everything will be fine.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE and TAB. He can be contacted through TV Technology or attransientaudiolabs.com.

The Audio World Without ISDN

Jay Yeary — Mon, 22 Aug 2016 09:00:00 +0000

I was working on a string of projects back in the mid-1990s that required us to fly in talent to handle Spanish and Portuguese translation. We searched for ways to make the work more efficient and discovered that a technology called Integrated Services Digital Network (ISDN) was being used by producers to record musicians from all over the world and by movie studios for remote ADR. It offered high-quality audio and latency low enough that conversations between the studio and remote artist felt completely natural.

We hopped on board with one of the early black-front Telos Zephyrs, which was compatible with practically everything and a massive APT WorldNet unit that only talked to other APT units, and the codecs went into service doing remote voiceover sessions and live event announcing.

Voiceover artists (now voice actors) saw the potential of the technology right away, so many of them built their own ISDN-equipped studios and offered their services globally, and in the process, became the original “work-from-home” pioneers of the broadcast industry.

RUMOR MILL

Software IP codec comparisons
Click To Enlarge ISDN technology has served broadcast professionals faithfully for nearly 30 years, but rumors have been circulating for some time that the telcos plan to stop providing it because the service is not profitable enough. Up to this point, it has remained just a rumor, but it now appears to be very real.

A voice actor friend is in the process of relocating his studio within the same New York City suburb. He has been told by Verizon, his long-time ISDN provider, that they will not reinstall his service once he leaves his present location, despite the fact that he has paid them thousands of dollars annually for decades for the privilege of having those lines. Unfortunately, no other provider in his area offers the service, so we have been looking at alternatives.

It makes sense that the telcos want to kill off ISDN just as they want to kill off the analog home telephone line. Both are remnants of the circuit-switched public telephone network, which makes inefficient use of the telco network compared to packet switching.

Circuit switching establishes individual point-to-point connections, switching both ends at the same time, and creating a dedicated connection between those two points. Packet switching isn’t interested in making dedicated point-to-point connections; it breaks data up into packets that are sent off on what could be completely different paths, to get reassembled at the far end. Packet switching can also send those packets to multiple destinations at the same time.

One alternative to a dedicated ISDN line is the Telos Zephyr Xport, which connects to a traditional phone line or the Zephyr Xstream ISDN codec. The ISDN lines themselves are copper-twisted pairs that carry digital data instead of analog and there are two variants. The type normally used with audio ISDN codecs is a Basic Rate Interface (BRI), which consists of two 56 kbps bearer channels and a 16-kbps channel for data signaling.

Some codecs gang BRI lines together to improve quality, reduce latency and pass bidirectional timecode and control information. Each bearer channel is connected to the telco through a network interface (NT- 1); and each is assigned a Service Profile Identifier (SPID), which tells the network who it is. The Primary Rate Interface (PRI) version is essentially a full T-1 line carrying one 64-kbps signaling channel and 23 64- kbps bearer channels.

The monthly fee for a PRI is significantly more than a BRI, which is why it’s normally only installed to provide Private Branch Exchanges (PBX) to businesses. PRI, unlike BRI, doesn’t appear to be on the chopping block so far.

AVAILABLE OPTIONS

For those facing the loss of ISDN service, there are a number of options to consider with IP audio codecs seeming the obvious choice. Telos makes an intriguing device called a Zephyr Xport that connects to a traditional phone line, yet it can also connect to their Zephyr XStream ISDN codec. Unfortunately, since it only connects to that particular codec it’s only useful if the studio at the far end has an XStream with the proper version of software installed.

Source-Connect was the first software alternative that directly competed with ISDN, though it has since been joined by SoundStreak, ipDTL and Source-Connect Now. Each requires a computer, network connection and the same software at each end. They are differentiated primarily by their pricing models:

• Source-Connect offers three purchase options and a subscription;
• SoundStreak doesn’t charge talent, but does charge production;
• ipDTL has an array of pricing options, and
• Source-ConnectNOW is free with premium options available.

Most ISDN users are familiar with bridging services, even if they’ve never used one. These services started as a way for users with one type of ISDN codec to connect to an incompatible ISDN codec at the far end. The bridging site establishes connections to the origination and destination codecs using in-house-compatible units, and passes audio between the units. These services now offer bridging from software codecs to ISDN codecs, especially appealing to anyone without ISDN service. Bridging services include industry stalwarts DigiFo – n and EDNet, ISDN To-Go, ISDNBridge and Out of Hear.

There is one way to continue ISDN service without telco support, but it’s only feasible for those who work larger businesses: Generate the ISDN from the business PBX. IDSN lines created in this way work just like any other ISDN line with the exception that they need to dial out of the business network to place calls.

Choosing a technology replacement for ISDN is not easy, with client requirements always the key factor. Studios currently using ISDN should be able to continue using it for the foreseeable future, which means they will continue to require ISDN sessions from voice actors. However, once a studio decides to relocate, they may face this same dilemma. Next time we’ll take a look at making the transition from ISDN to IP codecs.

The State of AoIP Compatibility

Jay Yeary — Thu, 28 Jul 2016 15:30:00 +0000

Click on the Image to Enlarge
KEEP IN MIND
Before proceeding, there are a number of items to keep in mind. First is that AES67 could be considered the lowest common denominator for successfully sending audio across a network using IP. Many pieces required to provide a complete AoIP solution— most notably discovery, management and control—are not included in AES67 and were never intended to be. Click on the Image to Enlarge

One of my missions at this year’s NAB Show was to take a deeper assessment of the state of interoperability of audio-over-IP devices in preparation for an upcoming project. I was specifically interested to see whether a complete system could be built using disparate components from different manufacturers, utilizing whatever AoIP technology had been chosen for the product.

The project is for a broadcast facility audio control room with no significant connectivity to the video router, so the key pieces are the digital audio console, intercom system, digital snakes and peripheral equipment. This system needs be configurable in minimal time with minimal hassle. All components must be guaranteed to work together and need to be as interoperable as possible, which means complying closely with AES67.

As a longtime proponent of AoIP it was my opportunity to take a closer look at what could be assembled into a functional AoIP system.

Fig. 1: OSI layers

It is also important to remember that the internet protocol (IP) can be carried over a variety of transmission paths, including Ethernet, ISDN, ATM, FireWire and DSL, making it quite versatile.

The scope of the network itself should be kept in mind, whether it will be a small internal network or one that expands outside the boundaries of a given subnet, building, campus or continent. AoIP equipment working at OSI Layer 2 will suffice for small internal networks, but any AoIP network that will roam farther afield will need to operate at Layer 3.

Finally, and most important for us, the audio settings of the technologies must match.

My mission didn’t get off to a particularly auspicious start, even before the NAB Show was underway. At the DTV Audio Group meeting the day prior to the show, Thomas Edwards, vice president of engineering at Fox, presented findings from his experiments with AoIP interoperability, concluding that interoperability wasn’t working, primarily due to transport differences.

Fortunately, he didn’t leave things there as he went on to elaborate that this situation was just temporary as video-over-IP standards were beginning to gel around SMPTE 2022 and with AES67 as the likely audio transport.

Later on in the day, Tobias Kronnenwett of Lawo, talked about how they were having continual success with AES67-based AoIP technologies on a large scale with Moto GP races, proving that the technology itself is proven in real-world use.

As I started my own examination of each AoIP technology I made the disheartening discovery that simply claiming compatibility with AES67 does not necessarily indicate that the technology is interoperable.

Table 1: AoIP compatibility matrix

In the case of Dante, upgrades made to their devices means that Dante and AES67 streams can coexist on the same network, which is about the same as having roommates— you share the same space, but that doesn’t mean you get along. In Dante’s case, they legitimately want to integrate AES67 streams, but may be hesitant to modify their robust and very successful core technology just to play nice. The players who do claim full compliance with AES67 include Ravenna and Telos, but each have their own methods for discovery, control and management to fill in the gaps.

TECHNOLOGY CHOICES
When we get down to the AoIP equipment itself we find most manufacturers making definitive technology choices, with Dante appearing to be most popular. Among large intercom systems, Delec and RTS/Omneo have chosen Dante, while Riedel has announced support for Ravenna with their Tango intercom to complement their longstanding support of AVB.

Taking a different tack, intercom vendor Clear-Com has announced their own technology called Instant Voice Core (I.V.Core) which claims AES67 compliance and which, perhaps providing the “Rosetta Stone” of the intercom world, purports to have also been tested for compatibility with AVB, Dante and Ravenna.

The world of audio console players doesn’t provide any real surprises with all the name players getting into the game, though in decidedly different ways. Digico consoles and racks can be outfitted with modules that provide Dante AoIP connectivity.

Calrec has very diplomatically decided the best way to accommodate customers is by letting them have the AoIP technology of their choosing, whether it be AES67/Ravenna, AVB, Dante or SMPTE 2022-6.

Ravenna’s parent company Lawo offers AoIP options across a large portion of their products including consoles, routers, racks and interfaces.

Dante solutions have been arriving from SSL for years and, at the 2016 NAB Show, they were showing their System T high-end, large-format broadcast console.

Wheatstone offers AoIP in five of their audio consoles and other products through Wheatnet, their AES67-compatible solution.

Finally, audio industry stalwart Yamaha offers Dante console cards and remote rack units.

Unsurprisingly, no single manufacturer makes all the equipment necessary to build an entire broadcast audio system. Very few have anything close to an entire AoIP ecosystem within their own product lines, something I find more than a little disappointing since it leaves us using an awful lot of glue to stick onto non-AoIP equipment.

The bright spot is that AoIP works and works well if all components use the same base technology. It can also be successful to a lesser degree with equipment from different manufacturers, but complete compatibility can only be guaranteed when all equipment is using the same AoIP technology.

An all-Dante system will obviously work, as will an all-Ravenna system, and any technology operating in AES67 mode will pass audio to other AES67 devices, but the method that manufacturers utilize for discovery, control and device management may mean that the components won’t know about, recognize or pass control to each other.

The challenge for my project at this stage is to choose a base technology that allows everything to work seamlessly while remaining open enough to accept other technologies as AoIP matures.

The Changing Deliverables Of Broadcast Audio

Jay Yeary — Tue, 19 Jan 2016 09:13:00 +0000

NEW YORK—This column is the wrap-up to a workshop I moderated at the AES Convention last October with panelists Jeff Brugger of Turner Studios, Michael Cardillo from Creative Waves, television mixer extraordinaire Ed Greene, and Sean Richardson, principle audio engineer at Starz. The proposition of the workshop was that delivering mixes for television has become substantially more complicated with the large number of outlets content is delivered to and the deliverables required by each outlet.

Where we once simply sent a mix to the primary air chain, they can now end up in places we never dreamed, after going through file manipulation processes over which we have little information or control. In addition, wide dynamic range surround mixes designed for broadcast are impractical in just about every situation outside the living room, and preparing content for the mobile world and low bandwidth connections may require modifications to channel configuration, bit rate, codec and loudness settings.

Jay Yeary (L) moderated “The Changing Audio Deliverables for Broadcast and Media” workshop at the 2015 AES Convention, which included panelists (L to R), Jeff Brugger, Sean Richardson, Ed Greene, Michael Cardillo. With loudness specifications for streaming just beginning to be sorted out, it’s still a little like the Wild West for content once it leaves the traditional broadcast path. With ATSC 3.0 bringing immersive 3D-like soundfields to the home, including more than 100 channels to manipulate, even broadcast deliverables are becoming more complicated. The following is a distillation of my key takeaways from the workshop.

Mixes Will End Up in Places We Never Dreamed—Longform content generally has the longest lifespan so it should end up in more places than promos and short-form content, but all content will be heard in more places than we think. At minimum, content will be heard on broadcast, cable, network websites, OTT, theatrical advertising and international outlets.

Due to limited time and budgets, (and our inability to foretell the future), it is impossible for a mixer to create a different mix for each outlet, so mixers should provide the best mix possible for the known outlet(s) and not sweat the others.

If the Mix Sounds Bad, Make Some Noise About It—Taking the time to listen to a mix on the primary outlet may unearth problems that a broadcaster or content outlet is unaware of and will help ensure that what’s on the air sounds as close as possible to the way the mixer intended. It can be a bit of a hassle, and no broadcaster wants to get problem calls, but if the content is not airing properly, it will also bother viewers, not just the mix engineer.

Delivery Specifications Are Critical—Broadcasters spend a lot of time developing content delivery specifications with the intention of airing content without modification of the original. Ensuring content meets the broadcaster’s specifications will save everyone a lot of grief.

Particular attention must be paid to channel configuration to guarantee the proper audio shows up on the correct channel in the air chain. Not all broadcasters use the same 5.1 channel layout and some still require delivery of content with Dolby E. Adherence to loudness specifications will keep content from being rejected for incorrect loudness levels or for exceeding True Peak values.

OTT Deliverables Are Different— Over-the-top and streaming deliverables are different because many devices can’t handle multichannel, don’t have the headroom for wide dynamic range content, or are used in noisy environments where the only possible way the content can translate is when the gain is raised and headroom is reduced.

In October 2015, the AES released technical document TD1004.1.15-10, “Recommendation for Loudness of Audio Streaming and Network File Playback,” which recommends that streaming loudness not exceed –16 LUFS nor be lower than –20 LUFS, with peaks not to exceed –1.0 dBTP. In essence, the AES appears to be advocating –18 LUFS as the overall target for streaming loudness, 6 dB higher than the target for broadcast loudness, yet the True Peak value gives back only one of those decibels for headroom. Sessions requiring additional deliverables for OTT may not have additional time booked so it may require the mix engineer to output multiple mixes at the same time.

The Lt/Rt DownmixMost People Only Hear Two Channels So Always Check the Downmix—Despite 5.1 systems being in homes and immersive audio on the way, most people still listen to television through two speakers, either because they don’t have multichannel setups or because their location dictates a minimal listening environment.

This means that the two-channel version of a mix is significantly more likely to be heard than the 5.1 version. If 5.1 is the only specified deliverable, mixers should pay careful attention to how it will sound when downmixed.

It is always best to deliver a separate two-channel mix for air everywhere that isn’t 5.1, but some broadcast air chains can’t deliver separate mixes to the home so it has become critical to listen to the downmix of all multichannel mixes and make adjustments to the original if the downmix suffers.

Lt/Rt and Lo/Ro, There Is a Difference—Lt/Rt stands for left total/right total and is a method of matrixing 5.1 mixes into a two-channel version. Lo/ro means left only/right only, in other words, a straight two-channel (stereo or mono) mix.

In order to provide a full-time surround image to viewers, some broadcasters have specified Lt/Rt as their two-channel deliverable in the belief that it will upmix better to 5.1 than pure stereo mixes, which is generally true as Lt/Rt starts with 5.1 and tries to provide an approximate reproduction of that mix when decoded.

Accomplishing this first means throwing away audio from the LFE channel, so any important sound in that channel should be mixed into the Left and Right channels. Left and Right are each separately summed with Center, which has been lowered by –3dB. The Left and Right surround channels are summed together; then added to the Left/Center sum at 180 degrees out of phase, and to the Right/Center sum at zero phase. Needless to say, all of this channel combining, gain adjusting and phase shifting is not what mixers generally want for their mixes.

Unfortunately, this process is typically done automatically in the broadcast air chain rather than being part of the mixing process, so many mixers would prefer it go away and have outlets broadcast Lo/Ro instead, forgoing upmixing entirely.

Jay Yeary would like to thank Jeff Brugger, Michael Cardillo, Ed Greene, and Sean Richardson for their willingness to share their experience and insight at the workshop that was the basis for this column. Jay can be contacted through TV Technology magazine or attransientaudiolabs.com.

Ambisonics B-Format for Immersive Audio

Jay Yeary — Mon, 28 Dec 2015 00:10:00 +0000

SAN ANTONIO, TEXAS—With immersive audio knocking on the door of broadcast television, it seems like a good time to investigate some ways to create content for the format. According to some in the industry, a look at the future of immersive audio requires us to charge up the flux capacitor, hop in the DeLorean, and travel back to 1975.

At the 2015 NAB Show, Pro Sound Effects demonstrated an Ambisonics sound effects library of London ambiences. Initially I was skeptical about the utility of these sounds, but, after previewing them and realizing the amount of control available, I walked away intrigued by the prospect of creating and using sound effects that were immersive all the way from capture to consumer.

Testing was done with the SurroundZone 2 software provided with the sound effects library to decode the B-format audio and recreate the captured sound field.

Considered by some to be a dead format, Ambisonics has resurfaced as a possible immersive format, especially since it was created to capture and reproduce 3D audio from the outset. While it’s important to look at how Ambisonics works as an immersive format, we also need to see how it holds up when listening is non-immersive. Though the London sounds previewed at the NAB Show are not yet available, a library of New York City Ambisonic sounds was released and used as test material for this column, along with some B-format recordings I made several years ago.

While we will look specifically at Ambisonics technology in this column, there are other methods of capturing multichannel sound.

COINCIDENT PAIRS
All microphones designed for capturing sound fields and ambiences are composed of some array of capsules, either spaced at a calculated distance or based on some form of coincident pair, where two microphones are placed nearly touching, facing the sound source, with their capsules angled 90 degrees (typically) to each other. X/Y, Mid-Side (M-S), and Blumlein Stereo are coincident pairs, and it, in fact, was Alan Blumlein, the famed English electronics engineer who developed this technique in the 1930s.

The benefit of coincident pairs is that timing and phase errors are eliminated because the source sound arrives at both capsules at the same time with equal amplitude if the source is directly in front and slightly early/late with differing amplitude if the sound is to either side. The elimination of timing and phase errors also means that the sound captured with coincident pairs does an excellent job of folding down to a single channel.

SoundField microphones, designed specifically for Ambisonics, are based on the coincident pair technique, with four capsules close together as opposed to two, in order to provide height to supplement front/back and side to side. Microphones from other manufacturers are available with different coincident pair implementations, essentially some expansion of M-S, including the Schoeps Double M-S and the Sanken WMS-5.

Researchers are experimenting with other microphones and placement options for capturing 3D sound fields, but all recordings tested here were made with the SoundField ST450 microphone system.

SoundField microphones capture raw audio (A-Format) from the four capsules (Lb, Lf, Rf, Rb) in their array, which then go into a processor that converts them to B-format signals for recording. B-format matrixes the raw audio into a more or less 360-degree sound field by recombining channel content to create one omnidirectional channel and three bidirectional channels.

The omni channel, W, combines audio from the four capsules (Lb+Lf+Rf+Rb) and acts as a sound pressure channel. Of the bidirectional channels, X supplies front/rear imaging (–Lb+Lf+Rf–Rb); Y provides side-to-side imaging (Lb+Lf-Rf-Rb); and Z gives us up and down imaging for the all-important height dimension (–Lb+Lf–Rf+Rb).

TESTING SOFTWARE
Hardware processors or software can be used to decode the B-format audio and recreate the captured sound field. My testing was all done with the SurroundZone 2 software provided with the sound effects library. Since B-format recordings consist of four channels of audio (BWF), it seems natural to place them on a four channel-wide track, but they actually need to be placed on a 5.1 track in order to get a 5.1 output (the software currently supports up to 7.1).

Once the decoding software is inserted onto the multichannel track, you have the ability to steer the decoded channels around in the mix as well as change microphone orientation, tilt, width and other parameters. The amount of control you have as the mixer is actually quite substantial, which gives you the ability to create some very dramatic shifts in the image. The most dramatic of all for mixers may be that, despite the fact we are working with a captured 3D sound field, it collapses perfectly to stereo and mono.

There are certainly some issues with the SoundField system if the goal is to use it for truly immersive projects. First of all is the fact that the decoding hardware and software only support 7.1 and there are no current provisions for audio object placement. For post work there is already plenty of overhead involved in creating immersive audio mixes with not many workstations equipped to handle the workflow out of the box.

Even though three dimensions of audio are captured in the B-format recording, I was unable to locate any currently available tools that give us the ability to recreate those dimensions in an immersive production environment.

If there was ever a technology begging to be used for immersive audio, this is it. In a 1990 AES paper on Ambisonics, author Roger Furness stated that, “The original sound field, whether it was a live sound or one created at the mixing console, has an infinite number of possible sound directions.” That sounds more like the immersive technology coming with ATSC 3.0 than one already 40 years old.

More testing and development of Ambisonics are certainly in order, especially when incorporating the technology into immersive workflows, but downmix compatibility is already there. Expanded decoding tools are needed and despite the fact 3D audio can be captured with other microphone configurations, many of those setups are not portable, making them useful only in controlled indoor settings and unsuitable for live events, which are increasingly important as broadcasters seek to differentiate themselves from OTT services.

Jay Yeary is a broadcast engineer and consultant specializing in audio. He is an AES Fellow and a member of SBE and SMPTE. He can be contacted through TV Technology magazine or attransientaudiolabs.com.