A Sound Approach to Delivering Sports Audio

We all enjoy audio books, but have you ever tried watching your favorite television program muted?

Audio is a very important ingredient of any audio visual media. It is that one ingredient that brings the entire content to life; otherwise it would be lifeless motion jpegs. But many times audio doesn’t get the full attention it deserves in the media workflow because of the misconception that delivering a superior consumer experience is driven by video only. While visuals are captivating, audio is also vital. If audio quality isn’t good, then even the best quality video will lead to a negative experience. Poor quality audio shifts consumer attention to noises and other types of distractions, impacting the overall experience.


Looking exclusively at sports, content is generously interspersed with commentary—it’s a way to provide otherwise inaccessible information to viewers in a timely manner, enhancing their viewing experience. Viewers are generally interested in knowing important information not covered by the live camera angle, such as how the injured player is recovering, and whether the crowd is friendly or not. There are many things commentators can see and tell the audience besides sharing interesting facts and trivia in order to make experiences more memorable.

Further, the ability to hear clear game sounds makes fans feel as if they are viewing the game right in the stadium. Just imagine hearing the sounds of skates grinding into the ice in hockey or figure skating, the crack of a baseball bat hitting a homerun and a ping pong ball hitting a paddle. Audio quality in these situations is paramount. Not to forget, the sports sector is a big business, worth tens of billions of dollars worldwide. Digitization is further driving the revenue for global broadcast rights, but today’s digital consumers are spoiled with so many choices and susceptible to clicking away at a moment’s notice. Any slack in audio quality can lead to a loss of viewership and revenue. So it’s important now more than ever to understand how to uphold audio quality and create an immersive sports experience.


Unspoken guidelines exist for good audio quality. For instance, audio should be in sync with video and captions. It should be free of any artifact like noise, dropout, jitter and echo. It should follow the appropriate loudness guideline for that region. Some broadcasters would also require removing profanity from the audio. In the case of multilingual global delivery, correct audio language order needs to be ensured. To some extent these guidelines are easy to maintain for offline content, but sports—with all its frenzy and excitement—is broadcast live. The content is produced and broadcast simultaneously, leaving little room for quality control and correction. Knowing this, there are many different issues that can impact sports audio.


  • Sports broadcast experiences, by their very nature, feature high-intensity events like scoring a goal in a soccer match, which momentarily changes the loudness level. As the next generation of sports broadcasting experiences is being defined, it is important to ensure that high-quality audio experiences conform to loudness regulations:
    • Regional guidelines: Broadcasters should ensure content loudness complies with various regional regulatory standards like CALM in the U.S., EBU in Europe and ARIB in Japan.
    • Ad switching: When broadcasters insert advertisements in content for monetization purposes, it changes the context of the content and likely changes the loudness level. The broadcaster should ensure that the switch is seamless to ensure a smooth listening experience.
  • Any noise in audio can distract viewers, thus degrading the overall viewer experience. There could be multiple sources for such noises.
    • To enhance viewer experiences, broadcasters want to simulate the sports-arena-like environment with the predominant sound components containing audience voice, reverberation and stadium music. To capture all of the important components, broadcasters typically use multiple microphones strategically placed all over the stadium and mix them along with the running commentary. During the mixing process, it is very important to ensure that channel layout is followed properly. For example, if the center channel contains dialog, it should remain the same even after the mixing process. Any glitch here can severely impact the audio quality.
    • The biggest challenge during commentary is clear audio. This is the stage where different noises such as background and echo can be introduced into the content. The scene of action is generally an outdoor area. Weather, wind, cheers from the audience and many other subtle factors can also allow various noises to creep into the content. These noises need to be suppressed before content is sent upstream. Data transmission over a slow network also causes multiple issues such as audio dropout and jittering noise. The advent of OTT resolves most of these issues, as viewers automatically switch to a lower bandwidth in the case of a slow network.

Broadcasters should ensure that sports workflows are efficient to avoid the above kind of noises.

  • During live broadcasting, some audio segments may contain strong language, which can offend viewers and is especially unsuitable for children. These audio portions should be bleeped out before broadcasting, depending on the regulations. Broadcasters are required to integrate automatic speech recognition-based solutions to handle this scenario in a real-time manner and keep broadcasting delay to minimum.
  • As content goes global, overcoming the language barrier has become very important. To meet this requirement, OTT companies are now providing content in multiple languages. Different language variations of content are stored as a separate audio track. Broadcasters should ensure that the correct language order is maintained during encoding and transmission.
  • The latest smart TVs and digital media players support next-gen audio tech, such as Dolby Atmos, and raise consumer expectations. Streaming platform companies are also beginning to appreciate the need to invest in resources to improve audio quality and provide near-real-word experiences to consumers by leveraging the latest technologies.


Synchronization is one of the most important aspects of the viewer experience. Imagine watching a basketball game, and the sound of the ball hitting the court arrives before the ball actually makes contact. This would be extremely annoying for the viewer and would detract from the overall consumer entertainment experience. Synchronization errors can happen because of video as well as multilingual audio tracks. Sync loss could be caused by the mixing/editing process, mic position or even signal processing delays between the mic and camera. To provide a seamless viewing experience, broadcasters should prevent these types of synchronization issues.


Captions are required by all major broadcasters and are legally regulated around the world. Captions need to be in sync with audio on all fronts. To meet regulations and assure a good consumer experience there is a four-point regulation from the Federal Communications Commission:

  • Captions should be in perfect sync with audio (i.e., they should be displayed when the corresponding audio event occurs). Certain lag is allowed in real-time broadcast but if captions arrive too late they lose their merit and often become confusing to viewers.
  • Captions should be free of spelling mistakes and other inaccuracies such as grammatical and syntactical mistakes. This requires captioners to be well versed with sports jargon, player names and have exceptionally good language and keyboard skills.
  • Captions should be complete (i.e., complete audio needs to be captioned). This is very difficult, as with fast-paced sports sometimes the events take place at a speed that cannot be matched by captioners.
  • While deciding screen placement for captions, care should be taken so that important statistics displayed on screen are not hidden.

Further, profanity might require some censorship. Some checks are required during encoding as well checking the correct language against the correct audio track and so on.

In a real-time sports broadcast there is a very small window between the words being spoken and relayed as captions. Audio needs to be transcribed, time stamped and screen positioned within this small window. Doing it manually may not lead to high quality. It calls for using the latest technology in speech recognition and burnt-in text recognition to aid captioners.


With huge global sporting events taking place in Tokyo and across Europe in the shape of the soccer championships in 2020, the choice for how and where viewers watch sports is going to be greater than ever. Broadcast rights are soaring and there is extensive competition among OTT services, so the pressure to retain viewers is monumental.

Much like the competition on the football pitch, the stakes are high for content producers and broadcasters to deliver exceptional quality—both visually and in terms of audio. A quality control system designed to address the challenges of real-time sports broadcasts can be a game-changer. One that enables service providers to perform comprehensive video and audio checks, including loudness, closed caption and subtitle checks, as well as audio language detection and AV sync, and one that features a flexible and scalable architecture that improves efficiencies and provides exceptional audio-video quality on every device.

Manik Gupta is associate director of engineering at Interra Systems.

Manik Gupta
Director of Engineering, Interra Systems