Skip to main content

Moving to Automated Closed Captioning

Foolproof dialogue text accuracy via voice recognition software still years away, if ever


As the number of digital channels continues to grow, along with new FCC rules now in effect for PSIP-TSID, it's a safe bet that captioning and subtitling will play an increasingly important role in the digital era. And although the FCC will require all new nonexempt programming to be closed-captioned by next year, teletexting is no longer only for the hearing-impaired.

Instant messaging... e-mail... DVD multiple-language subtitles... cell text messages... live captioning in sports bars, health clubs and offices... Suddenly, scrolling text on screens of all sizes is everywhere. Closed captioning (providing script, music, sound effects and dialogue in crawling text) and its textual cousin, subtitles (noncrawling text burned into the video to translate spoken dialogue into another language) are finally coming into their own.

Two of today's biggest challenges for hardware/software makers are handling live programming and making closed captioning more cost-effective for local broadcasters. There's a growing demand by television networks in Europe (where "captioning" and "subtitles" are terms often used synonymously) and North America for live captioning, which increasingly translates into the use of speech recognition software.

John Boulton, business manager for Subtitling at London-based SysMedia, said the traditional model for captioning involved stenographers such as court reporters. "These are highly trained people, very skilled, and quite expensive. But speech recognition software has now reached the level of quality and accuracy where you can use someone of a much more junior grade. Captioning increasingly involves less personal skills to produce it," Boulton told TV Technology.


Out-of-the box speech recognition software today provides an accuracy level of about 80 percent or better, according to Boulton. "But if you think about it, that means one in every five words is wrong. That's still too high and you can't make sense out of it." When foreign words are entered into the primary language being captioned--such as the names of Iraqi officials, he said--speech recognition software still has a way to go, unless atypical words are added to the software's "vocabulary" beforehand, which is not always possible with live speech.

Bob Henson, president of Link Electronics in Cape Giradeau, Mo., thinks it will be awhile before the human element of live captioning is eliminated completely. Link offers a rack-mounted CC digital/analog encoder-decoder (PDA-895) and a similar portable unit (PDA-896). "People have been working on voice recognition for many years," Henson said. "I think just for the novice, these [software] systems should meet the FCC requirements. For the purist, you need human help. It's a mistake to treat the hard-of-hearing like secondhand cousins by not translating or spelling all the words correctly. But the software keeps getting better."

Others say even a speech-recognition accuracy rate approaching 99 percent can still be distracting because it means an obvious text mistake or two is popping up at least once a minute. "We're not yet to the point where we can pump live audio in and get totally accurate text coming out, but we're getting there," said SysMedia's Boulton. For recorded programs, which still represent the majority of closed-caption projects, caption accuracy levels are virtually 100 percent because there's usually adequate time for post production, including scrutiny of confusing speech.

SysMedia deployed its first WinCAPS captioning system to TV2 in Denmark for last year's live coverage of a royal wedding in Copenhagen. The same system continues to be used for captioning the Danish broadcaster's evening newscasts. Another relatively new software module, the Windows-based RenderStation from TM Systems of Los Angeles, is geared to DVD captioning and subtitling. It permits multiple-screen resolutions and text options for NTSC, ATSC and PAL, as well as both 4:3 and 16:9 aspect ratios.

ATSC estimates there are nearly 29 million Americans with some form of hearing impairment, which is about 10 percent of the U.S. population. But as anyone can attest who has been to a sports bar or health club lately, closed captioning is not used solely by the hearing-impaired anymore. Boulton said research, too, suggests more and more consumers are using closed captioning out of convenience or good manners, rather than physical necessity.

In early February, new FCC rules governing PSIP-TSID (transmission signal ID) took effect that, in part, also affects NSTC and ATSC closed captioning. Now "dynamic PSIP" (incorporating new data to comply with changing schedules, etc.) must include both EIA608 and EIA708 captioning--with correct-data service descriptors allowing end-to-end captions, and placed in both EIT (event information tables) and PMT (program map tables).

Also, by Jan. 1, 2006, the commission will require all new nonexempt broadcast programs to contain the closed-captioning option. (So-called "static PSIP" does not comply with new FCC regulations.) Several companies, including Triveni Digital of Princeton Junction, N.J., are providing toll-free PSIP Help Lines for broadcasters in the months ahead. (Triveni prompts callers to leave voice-mail questions at 1-866-874-8364 and promises rapid responses.)


Larry Goldberg, director of WGBH Media Access Group, which holds several patents on captioning devices, said a lot of exciting caption features are starting to pop up on the Internet, too, especially in the fast-growing broadband sector where TV and movie clip video are prominently featured.

One of the group's new products is CaptionKeeper--a software program that automatically converts TV closed-captioning data into Web streaming formats. Goldberg said it takes Line 21 captioned data as input and creates simultaneous outputs for both live and archived presentations viewable in the readily accessible RealPlayer, Windows Media Player and QuickTime formats.

"The new hardware coming out now can also let the user choose the shape, style and appearance of captions on their TV screen," Goldberg said, similar to manipulating fonts, text sizes and layouts in Word and other popular computer programs.

For broadcasting, replacing caption hardware with cost-effective software solutions will be of primary importance for years to come, according to Dr. Dilip Som, president of CPC Co. of Rockville, Md. "Software caption encoders and subtitle character generators perform all the functions of the hardware, plus save users money since the software costs significantly less than the hardware used to," Som said, with typical savings of about $500-$1,000 on a system.

And hard disks are rapidly replacing tape for timecoding and formatting. "Nonlinear editors have been around for years, but many facilities still use tape-based captioning and subtitling systems. Nonlinear timecoding and formatting save time, and are simply more efficient," said Som.