Closed captioning

Data insertion and extraction — and closed captioning in particular — represent a significant component of broadcasting. The insertion of captioning into the broadcast signal may be considered as just one small part of the transmission chain, but it is an important and often complex undertaking that can be represented as four main challenges.

The first challenge is determining how to organize closed-captioning production, especially considering prepared (or even prerecorded) and live-captioning services are structurally completely different and require entirely different core competencies.

Another challenge is figuring out how to store captions in broadcast facilities alongside media assets, and at which stage to encode captions, taking into account distribution, contribution, recording of live events for later re-playout and direct broadcast.

A third challenge is future-proofing and optimizing investments in closed-captioning management equipment, considering current migration to HD and DTV, along with the addition of new media vectors such as the web.

And, finally, it must be determined how to prepare integration with web services, should they be streaming, VOD, etc. Engineers have to handle different formats, different video and audio encoding processes, different asset management systems and, obviously, different ways of making captions available with these new media. In addition, the encoding process for media streaming can introduce delays that prove to be difficult to harness in order to keep captioning in sync.

Although the term “closed-captioning” is used in this article, the scope is not limited to “North American-style” closed-caption systems, but also addresses all other equivalent captions transport methods, such as European/Australian teletext/OP-47 or the Japanese/South American “B37”specification.

Captioning services

Mandates from ruling authorities, such as the FCC, state that distributors of programming must provide a minimum number of hours of closed-captioned programming per day, week or calendar quarter, giving terrestrial and satellite broadcasters, as well as cable operators, no choice but to deliver captioning data with their broadcasts. In addressing these requirements regarding the delivery of closed-captioning within digital television services, broadcasters basically have two choices.

In an approach often adopted by larger broadcast networks, internal services are developed to take on complete captioning services. At dedicated workstations, staff members perform both offline captioning, as well as live captioning, for news, sports and other live events. In-house services require trained staff and specific software and workstations designed to support live-captions creation. Because this process still cannot be completely automated with tools such as voice recognition systems, it essentially requires creation of an internal captioning department.

In a second approach, one more typical of smaller broadcast companies, the broadcaster will hire the services of a captioning production company that performs the service and delivers data into the facility's broadcast workflow. Lacking the human resources, space or bandwidth to train their staff to set up and maintain an in-house captioning service, such operations focus their efforts on implementing actual data insertion, as well as setting up and supporting workflows for externally generated prepared and live captions. Rather than build out internal captioning capability, these facilities must then address the IT and security issues related to opening the facility to an external service provider.

The question, then, is this: How do we keep, store and archive closed-captioning data and make it available for later (re)broadcast? In addition, broadcasters face the question of how to cope with the multiple formats in which closed-captioning data can be provided, even within a single facility, whether it is pre-encoded in video media, as an independent time coded closed-caption “script” file, or as a data stream as received from live captioning services.

Captioning workflow

Figure 1a illustrates a detailed view of a basic captioning workflow including both captioning encoding and management tools. In this example, a video and captions processing system connects with a caption management system.

The captions and video processing system — called an inserter — encodes actual closed captions and associated control data into the video signal, ensuring both compliance to required closed-captioning standards and integrity with the video signal. The captions management system ensures interconnection and interaction with the captioning data sources, handles the different ways of getting caption data (either as files or streams) and also interfaces with the playout automation system, the media assets management (MAM) and the playlist management system so that the required captions file is loaded and played or live-captioning source is connected in time. Also, it ensures accurate synchronization of file-based captions playout against currently-played video media timecode.

This captioning management system ensures on-the-fly encoding of closed-captioning data, should the source of this data be a file or a live stream. It relies on integration with a broadcaster's MAM system in order to retrieve caption files automatically. It also needs the automation system to trigger and synchronize playout of those files, which will appear as secondary events in the playlist. One benefit unique to this model is that because closed-captioning data is not encoded within the video file; the captions file can be delivered to the broadcaster or altered/corrected at any moment prior to the actual playout. (See Figure 1b.)

Another option illustrates the conventional broadcast transmission chain with captioning processing implemented on the front end, at the ingest stage. (See Figure 2.) In this case, captioning is embedded into ingested files that, maintaining captioning data with content itself, eventually will be transformed into a broadcast. One benefit of this model is that the MAM system handles audio, video and related closed-captioning data as a single asset, in turn simplifying management and movement throughout the rest of the workflow. Obviously, care should be taken to ensure that the chosen in-house video file format/wrapper is able to carry ancillary and closed-captioning data.

This latter approach does not address the requirements of setting up a structure for live captioning, for which at least a good part of the first model (described by Figure 1) is necessary. Further, regardless of the option primarily chosen to handle prepared captioning data, it also is necessary to record live programs along with synced captions for later rebroadcast. An ideal mix of both options presented in Figure 1 and Figure 2 is required in order to build the more complete and flexible solution. Obviously, the inserter should be able to behave nicely with video signals that already contain closed captions and other related information (such as XDS in North America, teletext pages or packet31 data in Europe/Australia, etc.) and be able to insert live captions data seamlessly and when necessary.

Closed-caption encoding in the baseband video signal allows captions data to be kept perfectly in sync and is also ideal not only for distribution in broadcasters' facilities, but also for contribution and recording of live events for later rebroadcast. There are, however, some existing workflows in which on-the-fly caption encoding is done at the latest stage, right before broadcast, at the encoder/multiplexer level. This approach is limited in that it lacks record capability and is incapable of distributing video with embedded captioning, especially for new vectors such as web streaming.

Streamlining operations

When it comes to choosing the type of closed-caption inserter needed, broadcasters should future-proof their equipment whenever possible. Also, it is important to keep the workflow as simple and coherent as possible. Thus, an auto-sensing inserter is a good choice. A captions inserter should be able not only to sense the incoming video format, but also pick the proper output captions standard.

Broadcasts today are typically delivered in both SD and HD. Therefore, the inserter should be dual-channel so that it can handle captions encoding both channels at once. Most certainly, the broadcaster will use a single format — SD or HD — video storage and playout server architecture with an up- or downconverter situated before the captions inserter to feed the inserter's second channel. Pre-encoded caption data that comes from within the “native” video SDI stream is then` copied and transcoded on-the-fly by the captions inserter to the up- or downconverted SDI stream that goes through its second channel. This type of solution permits the inserter to encode and insert live or external file-based captions in both SD an HD streams simultaneously.

In addition to offering a seamless integration in the playout automation system, a dual-channel, auto-sensing closed-caption inserter can result in a hassle-free and future-proof solution for those broadcasters who aim at migrating smoothly toward a full HD workflow, and beyond.

To the web

Creating closed captions for VOD and web-based program delivery is another challenge that demands flexibility in handling data and workflows. As long as proper and interoperable file formats both for media and captioning files are chosen and used, encoding captions from prepared captions files for VOD is a relatively common process. Through offline batch processing, captions files are “merged” into VOD media files. Difficulties arise when it comes to streaming captioned media — especially if several concurrent streams following different standards are required.

In addition to selecting which caption transport standard to use, the broadcaster must decide how to get caption data to be encoded in web media streams and how to maintain read synchronicity between captions and video streams. Timed-Text is becoming widely adopted and pushed by the FCC, among others. Microsoft's Synchronized Accessible Media Interchange (SAMI), another popular format, is used for streaming content in Windows Media format.

As for generating Timed-Text captions from a coherent and common source according to the workflow examples above — whether the production origin is live, pre-encoded or provided as a caption file — there is no common solution for now. One solution is to get caption data directly from the distributed program SDI video signal. In this case, the same kind of captions and video processing system as is detailed above (the “inserter”) should be used, just before the streamed-video encoders.

The inserter would be capable of reading and analyzing on-the-fly closed-caption data streams from the SDI signal. The extracted caption data stream then could be transcoded to Timed-Text, for example, and then be sent to the IP multiplexer for web broadcast, with a delay corresponding to the one implied by the encoder.

Using the increasingly sophisticated captioning solutions available today, broadcasters can build captioning workflow with a high degree of flexibility, streamline multiformat operations and minimize the costs associated with meeting increasingly stringent captioning requirements.

Renaud Desportes is the director of Wohler's ancillary data product line.