Skip to main content

SPECIAL REPORT: The many methods of closed captioning

As the broadcast industry transitions to digital services, broadcasters need to look carefully at how they handle closed captions. Photo design by Robin Morsbach, associate art director.

The FCC has adopted a time table that will require 100 percent of all non-exempt programming to be closed captioned by Jan. 1, 2006. And, to borrow a phrase from a popular TV sci-fi series, resistance is futile.

Workflow scenarios

As the broadcast industry transitions to digital services, the complexity created by multiformat programming means that broadcasters need to look carefully at how they will handle closed captions (CC). What was once the relatively simple act of inserting CC data into line 21 of an NTSC signal has now — as has every other aspect of DTV infrastructure — expanded exponentially in complexity.

When implementing CC, there are many workflow issues to consider. Are the VTR feeds already closed captioned? If not, should this task be handled in-house or by a captioning service? Does the video signal need to be converted between NTSC and ATSC, or between SD and HD? All this can be confusing. These circumstances will each dictate the production workflow and the equipment necessary to produce compliant CC programming. And broadcasters must adhere to many implementation details. Table 1 lists relevant CC features, FCC rules and technical standards.

Table 1. Closed-captioning features, rules and standards. Click here to see an enlarged diagram.

The obvious place to start is with an analysis of your production workflows. There are three general production scenarios to consider: live remote feeds, live studio shows (both of which require online or real-time captioning) and preproduced segments or shows (which can use offline captioning). For a remote feed, captioning at the point of origination or ingest allows you to record the program with captions on tape. If that show is ever to re-air, the captions will air with it. For a live show, the program audio is decoded from the signal, converted and fed over a telephone line to a CC service. An additional telephone line returns the CC text stream though a modem connection to the CC encoder. Figure 1 illustrates the online signal flow of a captioning service provider.

Myriad technical standards

A preproduced segment or show already on tape needs to be closed-captioned offline. This is a two-step process. First, a closed-captioning service provider produces a CC data file. Next, either the program production facility or a commercial encoding service completes the process by dubbing the captions to the program and producing media that is ready for air. Captioning offline allows synchronization of the displayed CC text with the dialog and eliminates any temporal incongruities between scenes and captions. For example, the video might show a soccer ball flying into the goal, but the CC showing the dialog describing the shot (“SCORE!”) wouldn't appear until a moment later, when the director cuts to a network tease.

SMPTE, SCTE, EIA and the ATSC have adopted individual technical standards regarding CC implementation. Two standards, originally developed by the Electronic Industries of America (EIA), define implementation specifications for compliant CC. Today, the Consumer Electronics Association (CEA) administers these standards. Their designations have become technical jargon. So-called “608 captions” refer to the CEA-608-B NTSC standard; “708 captions” refer to the CEA-708-B DTV Closed Captioning (DTVCC) standard.

Nuts and bolts

The NAB has published an excellent white paper entitled “Implementing Closed Captioning for DTV.” It is available at the NAB Web site: The recently approved SMPTE engineering guideline “EG 43-2004 Proposed SMPTE Engineering Guideline — System Implementation of CEA-708-B and CEA-608-B Closed Captioning” is available (for a fee) at the SMPTE Web site. Go to to search for engineering guidelines by number. Both papers describe system design methodologies used to create and distribute closed captions within a broadcast facility. They emphasize how to simultaneously caption NTSC and DTV. They also explain the relationship between the various CC standards and production and distribution workflows.

Figure 1. Live/online closed-captioning signal flow using a captioning service and plain old telephone service (POTS). Using this method, there is a two- to three-second delay between dialogue and the appearance of closed captions. Click here to see an enlarged diagram.

CEA-608 defines how line 21 carries the CC information in an NTSC broadcast. This standard encodes CC information as seven bits plus one parity bit at 120 characters per second and produces a data rate of 960 bits per second. CEA-708-B defines coding of DTVCC in an ATSC A/53 specified bit stream. This standard significantly enhances DTV display and formatting features. Document CEA-CEB-10-A discusses implementation details. CEA-708-B defines caption distribution packets (CDPs) that hold DTVCC data, 608 caption data, caption service information and (optionally) time code. This facilitates decoding digital cable signals and inserting 608 captions into line 21 of the NTSC output of an STB.

The caption service descriptor (CSD) is part of the caption service information data field in the CDP; it signals the presence and format of captions. It is defined in the ATSC A/65 PSIP standard. In the ATSC transport stream, the event information table (EIT) and the program map table (PMT) contain the CSD. CC data packets have a unique packet ID (PID) and are allocated a data rate of 9600 bits per second.

Figure 2a. The dotted lines trace an SD-SDI feed. If the upconverter is removed in the signal flow for the insertion of closed captions into a raw HD-SDI feed, the SD-SDI signal will be encoded as ATSC SD. Figure 2b. The signal flow for a VANC 708 closed-captioned HD-SDI feed and conversion to NTSC with 608 captions inserted into line 21. Click here to see an enlarged diagram.

For HD-SDI SMPTE 292M and SD-SDI SMPTE 259M, the CDP data is embedded in the vertical ancillary (VANC) data. VANC data can be switched, routed and stored and will remain associated with program content. This ensures persistence of CC data as long as material is in SDI format.

An NTSC/DTV simulcast facility must adopt an efficient and reliable infrastructure to support both NTSC 608 and DTVCC 708 protocols and to be able to convert between CC formats. Facility engineers must determine where they will locate the CC encoder in the signal flow.

Infrastructure implementation

Figure 2a illustrates the generic signal flow for adding closed captions to an HD-SDI program stream. The HD is downconverted to SD, captions are supplied to a 608 encoder, and an SD-SDI captioned bit stream is fed to an NTSC encoder. Also, the 608 captions are converted to 708 captions and fed to the data input of an ATSC encoder. An alternate signal flow would be to feed CC data to a 708 caption encoder and eliminate the 608-to-708 conversion. The dotted lines represent the SD-to-HD upconversion signal flow.

As shown in Figure 2b, when the program is an HD-SDI closed-captioned stream, the 708 captions must be extracted from the VANC and converted to 608 captions, then inserted into the downconverted SD video and converted to NTSC.

If the source program is NTSC, CC information must be extracted from the vertical blanking interval (VBI) data and converted from 608 captions to 708 captions. The derived CDPs are reinserted in the VANC of the SMPTE 292M SDI bit stream.

Pay special attention to a signal flow where closed-captioned programming flows through graphics, processing or compression equipment. Such equipment might destroy the CC data. In such cases, you must extract the CC data from the program feed and then reinsert it after processing. This technique is known as “bridging.” Figure 3a depicts the generic bridging signal flow. The extracted CC data is fed to an appropriate encoder for the converted program feed.

It is important to consider how your ingest server will handle captions. Figure 3b is a conceptual diagram of a solution to caption persistence through the ingest/playout program stream. Once essence is in the compressed domain, the CC information may no longer be married to the source essence. It is stored on a server with appropriate descriptive metadata. The CC data must be extracted from the feed before ingestion, then stored and inserted at the proper time as the program leaves the facility. CC-related PSIP information is multiplexed into the ATSC transport stream.

Figure 3a. CC bridging bypasses CG and GFX processors. Figure 3b. This diagram illustrates extraction of CC VANC data before ingestion and reinsertion into an ATSC transport stream. Click here to see an enlarged diagram.

Fortunately, broadcast equipment manufacturers have navigated the maze of standards, addressed these subtleties and designed turnkey systems that handle every permutation of CC workflow. Devices are available to handle all HD and SD formats and frame rates, including the popular 1080 24p format. Dual-function CC encoders can insert 608 captions into NTSC programs and 708 captions into DTV simultaneously, greatly simplifying infrastructure design.

Encoding equipment interfaces with CC data in a number of ways. When a facility uses a CC service, a telephone line and a modem connection supply CC data to the encoder. Other systems strip the CC data from the VANC and send it through RS-232 or a LAN connection to be reinserted further downstream. Equipment that performs the entire bridging process is available. Such units can strip and reinsert CC data, and require only a closed-captioned program input, simplifying system design.

Inside the boxes

Even if the NTSC “sunset” comes to pass on schedule and analog TV broadcasting ceases to exist on Jan. 1, 2007, there are still two full years during which broadcasters will have to deliver simulcast analog and digital TV with closed captions. And, after the NTSC sun has completely set, broadcasters will still need to convert legacy 608-captioned material to DTVCC 708 captions.

The Twilight Zone

Philip J. Cianci has been active in the television industry for 20 years. His work has included algorithm verification of closed-caption decoding in the Grand Alliance prototype decoder.