The Importance of Automating Live Captioning in Broadcasting

(Image credit: Red Bee Media)

Above and beyond complying with FCC regulations, providing accurate and effective captioning for live content is becoming essential for broadcasters who want to stay relevant, competitive and ultimately successful in a changing media landscape. 

The provision of closed-captioned content has been an important part of many broadcaster requirements over recent years. However, while hearing impairment remains a leading basis for captions, with around 48 million hearing-impaired viewers in the United States, there are new drivers in the market. 

An increasing body of evidence suggests that audiences with unimpaired hearing enjoy watching content with captions. A quick online search generates dozens of articles commenting on this curious development—and expressing surprise that a growing number of people who don’t traditionally need captions to engage with content actually prefer captions with all their content. This is coupled with an increased need for captioning as more people watch content in louder environments or without audio, particularly those who are trying to keep up with a rapidly evolving news story. 

These trends are given further weight by research from social media platforms showing that captioning increases both content reach and engagement. With reach and engagement increasingly crucial to broadcasters, captioning content has become not just a mandate but a “no-brainer,” a value-add and a differentiator, especially in competitive local environments.

What makes things more complicated is an increasing demand for captions on live content. Traditionally, live captions are created by human operators to reach quality standards that allow major national broadcast networks, and their affiliates in the top 25 market areas, to live up to FCC guidelines. Outside those markets, broadcasters have been allowed to use the cheaper electronic newsroom technique (ENT), which converts a teleprompter script into live captions, but this method fails when it comes to breaking live events, sports commentary and other absolutely live uses.

The circle that needs to be squared for broadcasters to be able to reliably offer automatic live captions is making automatic caption generation as reliable as human captioning by stenographer. That means typically reaching around 98% accuracy within an acceptable latency between the spoken word and the matching caption appearing on screen, but at ENT costs.

OVERCOMING ISSUES WITH AUTOMATIC SPEECH RECOGNITION

AI-driven Automatic Speech Recognition (ASR) solutions are beginning to make headway in live automatic captioning, but still fail on many key metrics. They are hampered easily by background noise and multiple voices and, more importantly, they are yet to provide the combination of quality, economy and reliability that is really needed for live broadcasts—especially when it comes to sensitive topics such as politics, where every word counts. All broadcasters will be familiar with the potential consequences of an embarrassingly miscaptioned crucial word.

This is where ARC comes in. ARC stands for “Automatic Real-time Captioning” and is a new service-based method of optimizing ASR to bring its accuracy levels up to those of human performances. 

The best ASR engines are currently pushing around 95% accuracy and are managing to do so at roughly half the cost of human live captioning, bringing live captioning within reach of not just the top 25 markets, but broadcasters all over the country. With ARC, these engines are brought up to genuine human-level performance by adding a couple of powerful stages to the process, further refining the accuracy in the captioning.

The first of these stages is an optimization layer that uses a seed vocabulary manually curated by expert captioners, typically of names and places specific to a broadcaster’s individual market. The captioner expertise is key to achieving high levels of accuracy—there is a definite art to adding exactly the right vocabulary while avoiding false positives. This layer alone can reduce incorrect words in a live broadcast by up to 25% on average, taking automation from risky to consistent performance on the most important terms in a broadcaster’s output. 

The next stage of refinement comes when the ASR is run through a dedicated captioning platform. The platform adds all the features needed to put captions on screen properly and governs subtitle processing and presentation, as well as applying house rules and language formats for extra readability. This automated process makes use of existing data to guarantee consistent branding and channel identity in the captions, as well as preventing offensive language and other undesirable terms reaching the audience.

The result is an automatic captioning system that meets all accessibility requirements, with greater accuracy than any competing systems on the market and at a lower cost than traditional human live captioning. 

ARC is not tied to a specific ASR engine, allowing users to maintain their current technology investment or choose the best solution for their needs and piggybacking on the rapid developments in the field. It’s a flexible service solution; rather than paying upfront for a box that runs 24/7 regardless of content, ARC is available via a “pay-as-you-use” pricing model, significantly reducing per-station costs for TV networks.

FROM PAIN TO GAIN

ARC can turn automated live captioning in the U.S. from regulatory pain to commercial gain. It effectively makes the best better, taking the already impressive performance of the leading ASR vendors and making their captioning capabilities even more accurate more often, and at a cost that is far below the investment needed for traditional live captioning by stenographers or voicewriters.

Historically, captioning has been exclusively about providing services for hearing impaired people, but the broadcast landscape is changing. More people are watching more content on silent screens or in environments where sound is not always available or audible. Providing accurate, high-quality captioning is now becoming an important differentiator, especially when it comes to live news and sports content. Automatic Realtime Captioning with ASR technology is the way forward for broadcasters who want to stay relevant, competitive and ultimately successful in this context. 

Juliet Gauthier is strategic product manager, ARC, at Red Bee Media.