A Primer on Advanced 24p

This column has discussed 24p video capture and recording in the past. We know that 24p, or 24 frame-per-second progressive scan recording, is well-suited to 24 frame film-to-video or video-to-film transfer because it establishes a one-to-one relationship between film frames and video frames. Further, the 1920 x 1080 x 24p high-definition scanning format has sufficient spatial resolution to be used as a mastering format.
(click thumbnail)An illustration of 2:3 pulldown.

The 24 fps film format has been used on television since television began. In North America as well as other 60 Hz television countries, it must be accommodated to a frame rate of about 30 frames or 60 fields per second. This has traditionally been done by adding 2:3 pulldown, which entails repeating each film frame's image as either two or three video fields, generating a field sequence of 2, 3, 2, 3, 2, 3... ; that is, the first film frame is represented in two fields, the second in three fields, the third in two fields, and the fourth frame of the sequence in four fields.

This sequence of four film frames being converted into 10 video fields is then repeated. This generates some "blur frames;" video frames containing two fields that have been derived from separate film frames. The 2:3 repetition cadence also produces perceptual errors in spatial location, giving rise to 2:3 judder, which is either a feature or a problem depending on whose opinion is solicited.

The desirable temporal "look" of 24 fps film may be replicated by 24p video, as discussed in last month's column, and this is one reason why it is not at all surprising that 24p capture equipment is becoming increasingly popular among television and cinema producers.

It must be noted here that 24 fps film frames and 24p video frames are functionally interchangeable with respect to their temporal characteristics, and that when we say "film frame," we may also say "24p frame." Some recent field-capture equipment uses some innovative approaches, and one of these is called "advanced 24p capture," or 24p A. Let's take a look at how it works.


Some standard-definition DV camcorders acquire in 24p, but record in 60i with pulldown added, because pulldown must be removed and 24p regenerated for editing, video-to-film (filmout) and television. The variation is to reverse the 2:3 sequence for every other 60i frame, so that instead of 2, 3, 2, 3, 2, 3... , the sequence becomes 2, 3, 3, 2, 2, 3, 3, 2.... This is the advanced 24p sequence. It generates a subtly different kind of judder from 2:3 judder, but this material is not intended to be viewed in 24p A form, but rather, to be reduced to 24p for editing and processing.

How does 24p A simplify editing? First, let's look at how the fields are constructed in both the 2:3 and the 2:3:3:2 sequences. In the normal 2:3 pulldown sequence applied to 60-field interlaced video, four film or 24p video frames--usually designated as frames A, B, C, and D--are distributed over 10 video fields (five video frames) as follows: Film frame A is used to form fields 1 and 2. Film frame B is used to form fields 3, 4 and 5. Film frame C is used to form field 6 and 7, and film frame D forms fields 8, 9 and 10, generating the following sequence of video frames--AA BB BC CD DD.

We note that this results in three "clean" video frames, and two mixed or blurred video frames. The first, second, and fifth video frames are made up of two fields derived from the same film frame. The third video frame, however, is composed of fields derived from two different film frames (B and C), as is the fourth video frame (C and D).

Because there could be motion or a scene change between film frames, these mixed video frames are frequently blurred. If we wish to remove pulldown and restore the material to the 24p format, we further note that there are clean video frames composed of fields representing film frames A (video frame AA), B (video frame BB), and D (video frame DD). Each of these video frames is composed of two fields derived from a single film frame. Film frame C does not have such a video frame representing it; its two video fields are distributed into two separate video frames, BC and CD. This makes it impossible to remove pulldown by simply keeping track of the video field count, de-interlacing the nonmixed video frames, and discarding the mixed video frames. To regenerate 24p frame C, two video frames must be dismantled, and the proper field from each of them combined.

Finally, we note that the three-field film frames, B and D, each produce a redundant third field that is not needed to regenerate the 24-frame material--a redundancy of 20 percent.

In 2:3:3:2 pulldown, as used in advanced 24p, four 24p frames are distributed over 10 interlaced fields, but every second field pair is reversed, generating a sequence of 2:3:3:2, rather than the 2:3:2:3 sequence of normal 2:3 pulldown. In this case, 24p frame A forms the first two video fields, F1 and F2, and 24p frame B forms the next three fields, F3, F4, and F5, as before. However, 24p frame C forms the next three fields, F6, F7, and F8, and 24p frame D forms the next two fields, F9 and F10. This generates interlaced video frames in the following sequence: AA BB BC CC DD. We immediately notice that this sequence generates only a single blur frame, the third video frame in the sequence. We also note that all four 24p frames have associated video frames, each containing two fields from the same film frame: AA, BB, CC, and DD. In the 2:3 sequence, the two redundant fields in each sequence of 10 are located in frames that also contained fields required to reconstruct film frame C. However, in the 2:3:3:2 sequence, the two redundant video fields are contained in a single video frame, the third one in the BC sequence. This is a more efficient method of packaging the video fields. If we wish to remove pulldown and restore 24p, we need simply discard the third video frame containing the redundant fields, and combine the fields of each of the four remaining video frames.

When normal 2:3 pulldown material is de-interlaced and reduced to 24p, sophisticated processing is required to detect and keep track of the 2:3 sequence, dismantle the video frames that contain the fields representing film or 24p frame C, discard the appropriate redundant fields, and reconstruct the fields into the 24p sequence. Because this kind of processing must operate on a video stream as opposed to a compressed bitstream, compressed video must be decoded to baseband form before it can be done.

Recovering 24p from 24p A is a simpler process. As stated above, once the 2:3:3:2 sequence is identified, pulldown removal is reduced to discarding one of each five video frames and combining the fields of those remaining.

Further, since DV compression operates within each interlaced frame and not across interlaced frame boundaries, redundant frames may be removed while the video remains in the DV-compressed domain. Pulldown can, then, be removed from compressed 24p A material as it is transferred for editing, without it ever leaving the compressed domain. This simplifies and speeds up the process, and reduces the editing storage requirement by 20 percent in the bargain.

This approach is only used in the SD world, but it is an interesting technological innovation.

Randy Hoffner