Captioning for Sports Broadcasting and Venues

(Image credit: ENCO Systems)

The spoken word in sports is essential to how plays unfold, how athletes observe and interact with one another, and how fans respond to the action. Accessing all this dialog has never been easier nor more entertaining, yet there’s more value to be made.

Technologies are now maturing to inject deeper, live intelligence into these conversations. The allure of having alerts and live searchable access to dramatic vocal interactions at precise moments through instant, automated insights is strong for the sports producer and director. Gaining faster access to game-changing moments on and off the field can only enrich the content available to live production teams and viewers, as well as in post-production. 

Embracing the ‘Third Rail’ of Television
These additional contact points can be enabled by embracing what we call the “’third rail’ of television content. Beyond video and audio, the raw text generated by automatically captioning every microphone feed in a venue can then produce a treasure trove of speech-generated data in real-time. 

That data can then be monitored for any special keywords spoken by people near (or wearing) specific microphones to generate logic metadata flags and alerts and to build a stored dataset of “bread crumbs” that can be hopped through live or at a later time.

Taking the steps to create text transcripts from dozens of microphone feeds requires a philosophical change. Using traditional stenographic methods to generate these captions is simply cost-prohibitive. 

However, thanks to considerable advances in Artificial Intelligence-based Automated Speech Recognition (ASR), it’s now possible to embrace a fast, reliable, and highly accurate technology platform to generate this raw text across many audio channels at the same time. By generating text records of live content, each ASR stream has its own path and can be indexed to its own microphone source (and any associated cameras).

Captioning for Better Live Content
Imagine logging to ASR a specific shotgun microphone or lavalier feed from a specific player known to be particularly chatty at times, and who might state certain words that a producer has determined would be key to that game’s coverage (perhaps related to game-play, other players, weather, or other factors). 

Now imagine you have an ISO camera trained on them, just waiting for a visual moment. Once you start captioning that feed, a back-end automated process can generate an alert telling a producer to look back in the video where keywords were spoken. The result is instant access to that desired live nugget you are seeking. 

Auto racing introduces another interesting use case by leveraging ASR to log and generate transcripts from the race car. Whenever certain drivers mention prescribed keywords, that audio (and video) can be flagged and sent to a producer to consider using for replay in the live broadcast or via social media.

In-Game Replay Applications
In-game replay applications offer another compelling approach. Imagine focusing the microphone and camera on a referee to capture key moments when they mention certain words of importance that the user pre-saved before the show. While you can have an operator log their dialog, not everything can be logged accurately. 


ENCO's Encaption soccer flowchart (Image credit: ENCO Systems)

If you have a live ASR running on that referee’s audio feed, with the right software you can perform a text word search on their replay video to navigate to the moments where they stated those words you wanted, enabling faster access during the production. Keyword searches can be done in post-production too, as long as the user saves ASR text transcripts for each microphone feed (which would ideally also has a camera view of the subject to help pinpoint and visualize something of interest).

This treasure trove of speech data can open a whole new world of creative possibilities. No more is a production team limited to what their crew can hear or see and manually decide upon or annotate. With ASR it’s possible to have automated processes that help the user keep an “eye” out for those special moments when certain players or others say the magic words. 

ASR effectively alerts the user to that instant via logic flags, metadata, or other notifications. From there, simply conduct a search backward to the time point indicated by the transcripts to access the audio.

Broadcasters can apply this same technology to tag whenever a play-by-play or PA announcer mentions specific words or names. Did a specific player just do (or say) something interesting? When did the announcer mention that player’s name last? Now the user can search to that point and determine if there’s compelling footage.

While crucial to helping the deaf and hard-of-hearing communities, captions aren’t just about FCC-mandated accessibility anymore. AI-based speech-to-text technology has evolved dramatically over the last few years. Harnessing these improvements in the world of sports opens up endless possibilities to enrich your content and ignite the fan experience to keep them coming back for more. 

Bill Bennett
Media Solutions & Accounts Manager, ENCO Systems

A Media & Entertainment industry veteran, Bill serves ENCO's customers in a myriad of ways, from application engineering to business development, to driving new product innovation and trial POC's, to managing key ENCO Accounts and partner development. He also serves as a thought leader writer and presenter and product evangelist, produces ENCO's Webinars, and launched their video production studios. He joined ENCO after a sales engineering role with German broadcast manufacturer LAWO AG. 

Previous, he oversaw Olympic venue technical planning, build-out, and operations across five Olympic Games, served as an executive producer and new media business development executive at QVC (inventing groundbreaking ITV & OTT streaming e-commerce products), was a long-time freelance broadcast engineer, consultant and project manager for customers spanning the NBA, NHL, NFL, NCAA and countless US broadcasts, and owned a laser display production company producing & performing large-scale laser displays at major events & venues.