Editing long-GOP video

When HDV was introduced, editors quickly discovered that when they exported a production back to HDV tape, the process often took hours. Many saw long export times as a reason to avoid long-GOP formats. At the same time, some marketing campaigns promoted the idea that long-GOP MPEG-2 was difficult to decode, making editing cumbersome. Also, so-called experts advised editors new to HD that MPEG-2 should be decoded in the VTR, sent via HD-SDI and stored as uncompressed video. Or, if that weren't possible, during capture, it should be transcoded to a better 4:2:2 codec.

Folks were also warned that if they edited long-GOP natively, the use of FX would result in poor quality because renders would be re-encoded to MPEG-2. And, worse, repeated re-encodings would further destroy quality.

The result of these warnings was a fear of native long-GOP editing. Thankfully, as NLEs were enhanced to support various types of MPEG-2 and more editors came to understand what really went on inside their NLE, this fear has lessened. (See “MPEG-2 editing myths.”)

There remains, however, the issue of long export times — even with a cuts-only MPEG-2 timeline. The reason for the delay is that when a clip is trimmed during editing, the beginning and/or end of each clip likely has its GOP structure broken.

To obtain a perfect GOP series during MPEG-2 output, starting at the first frame in a timeline, every GOP is decoded to a sequence of YCrCb frames. Then, each series of six, 12 or 15 YCrCb frames are encoded into GOPs.

Smart GOP splicing

When you watch ATSC HD MPEG-2, transport stream sources are real-time spliced with frame accuracy. MPEG-2 data streams are spliced by shortening the GOP of the last clip (called the outgoing GOP), shortening the GOP of the next clip (called the incoming GOP) or shortening both GOPs.

Because MPEG-2 allows shorter than six-, 12- or 15-frame GOPs, it would seem the task is a simple one. Figure 1 shows an example where four frames (in red) have been trimmed from a GOP. While this appears simple, look again at the generated series of three GOPs. You will note the I-frames (in orange) have moved closer together.

I-frames add — relative to P- and B-frames — a huge quantity of data to the data stream. Normally, these I-frame peaks are smoothed by the P- and B-frames that naturally occur before the next I-frame. However, if I-frames occur too frequently, the stream data rate can become too great.

One way to prevent this error, as described in a Sarnoff patent on a broadcast MPEG-2 splicer, is “to adjust the … levels between the from-stream and to-stream such that … the resulting spliced transport stream will not suffer overflow, underflow or other undesirable decoder buffer memory behavior.” The goal is to decode and encode only GOPs that lie on a splice boundary. The re-encode is controlled by a feedback loop that adjusts encoder compression based upon the current data rate.

Feedback control when splicing GOPs is not only important for HDV that uses CBR encoding, it is also important for XDCAM HD, XDCAM EX and XDCAM HD 422, as well as AVCHD — all of which use VBR encoding. The feedback-controlled process limiting decoding and encoding to only those GOPs that lie on splice boundaries is often called smart GOP splicing.

Simulation of smart GOP splicing

To learn more about smart GOP splicing, I created a simple simulation of HD-1 (720p30 HDV) GOP splicing. In Table 1, the dark and pale blue cells represent two untrimmed, six-frame, closed GOPs. The left-most yellow cells represent the last B-frame from the GOP preceding the outgoing GOP. The right-most yellow cells represent the first I-frame of the GOP following the incoming GOP.

Notation within cells indicates that each I-frame is 1.44MB; each P-frame is one-half (I/2) of an I-frame (0.72MB); and each B-frame is one-fourth (I/4) of an I-frame (0.36MB). Therefore, the total data in one GOP is 3.6MB. The bit rate for five GOPs (1 second) is 18Mb/s — the video data rate used by 720p30.

The upper half of Table 2 has the final B-frame of the outgoing GOP and the initial I-frame of the incoming GOP — as represented in Table 1 — trimmed away. The lower half of Table 2 displays the two new five-frame GOPs created by the re-encode process.

Notice the re-encoding process has handled the difficult situation created when the I-frame of a GOP is trimmed away. As shown in Figure 2 on page 36, simply removing an I-frame would create an illegal GOP (shown in blue).

Table 2 and Table 3 have bit rate values (shown in red) that indicate the data rate relative to a nominal 18Mb data rate. In Table 2, this value is 104 percent and indicates the increased data rate created by trimming away two frames. Values above 100 percent indicate an overload.

Figure 3 shows data rate increase above 100 percent as a function of removing two, four, six, eight and 10 frames from two six-frame GOPs.

To prevent data rate overload, a smart GOP splicing system, after decoding all frames in a pair of GOPs, encodes them using higher compression. Therefore, the amount of data in each frame is reduced. For example, Table 2 shows I-frame data has been reduced from 1.44MB to 1.39MB.

Table 3 shows the progressive shortening of outgoing and incoming GOPs. In the second example, the final two B-frames of the outgoing GOP plus the initial I- and B-frame of the incoming GOP have been trimmed away.

Table 2 and Table 3 have a value (below the data rate value) that indicates the quality of the two GOPs after re-encoding. As you can see, increased compression causes a relatively significant loss of quality as the pair of GOPs is trimmed shorter. Thankfully, the lowest quality transitions occur at the shortest durations.

The Sarnoff patent addresses this issue by this statement, “With respect to rate control (which ultimately determines overall picture quality of the recoded portion of the transition clip), … due to masking in the human visual system, a small degradation in video quality at a scene change is often imperceptible to a viewer.”

Given that a six-frame GOP carries 3.6MB, the average amount of data for each frame is 0.60MB. Knowing this value, we can estimate the quality of the frames in the four GOPs centered on the transition. The values in blue cells in Table 3 present average post re-encoding data within the two transition GOPs as well as the preceding and following GOPs. Figure 4 shows these values, in megabytes from Table 3.

Although quality does vary depending on GOP length, average data size (shown in purple) is 0.622MB per frame. When compared to the nominal value of 0.60MB, the result indicates a smart GOP splicer keeps overall quality reasonably consistent for the four GOPs centered on the transition.

Smart GOP splicing restrictions

There are restrictions to using a smart GOP splicer when exporting to MPEG-2 — or any interframe format.First, even though Media Composer and Xpress Pro HD support smart GOP splicing, both NLEs typically render effects to DN×HD. Therefore, timeline segments where effects have been applied are no longer interframe video and will need to be encoded.

Second, some NLEs support a less powerful version of a smart GOP splicer. During export, these NLEs scan a timeline looking for those edited clips — with no applied effects — that begin with an I-frame. These clips are marked so they are not decoded and encoded during export. All other clips must be re-encoded. Obviously, these NLEs do not offer the potential of significantly reduced export times.

This type of smart GOP splicing is used by Final Cut Pro when already exported sequences — which inherently begin with an I-frame — are loaded without trimming into a master sequence. This sequence can then be exported without re-encoding.

Third, smart GOP splicing works only when the export codec is the same as the source codec. Obviously, this is the case when HDV-sourced projects are exported to HDV tape. With productions now needing to be burned to optical discs, it might seem the technology would be of less value. However, it continues to be used in several ways.

Some NLEs use smart GOP splicing to burn AVCHD-based red-laser discs from AVCHD source files that were recorded to solid-state media and hard disks. Given the huge computing load imposed when AVCHD is encoded, it helps make this long-GOP format practical to use. Moreover, because smart GOP splicing has value only when the source files are AVCHD, it promotes the support of native AVCHD editing. This in turn eliminates the need to convert AVCHD to a far less storage efficient codec when importing source files. In this way, AVCHD is replicating the history of long-GOP, native MPEG-2 editing.

Professionals who create long-form work, where multiple rough cuts are produced before effects are applied, can also benefit from smart GOP splicing. Rather than distribute rough-cuts on standard-definition tape or DVDs, high-definition MPEG-2 can be burned to optical discs. Red-laser DVD discs can be burned using the BD-5 and BD-9 (double layer) Blu-ray option. (In this case, the MPEG-2 data rate is limited to 25Mb/s, rather than 40Mb/s.) Alternatively, Blu-ray Discs (BDs) can be burned. Both options make use of MPEG-2 as the media codec. And, both types of discs can be played on any BD player.

Steve Mullen is owner of Digital Video Consulting, which provides consulting and conducts seminars on digital video technology.

MPEG-2 editing myths

The all too frequent claim that during the capture or import of long GOP MPEG-2, it must be converted to a “better” codec is false. All MPEG-2-based formats can, and generally should, be edited natively. Interframe source files require the least storage space and the least disk bandwidth.

Expanding file size during import in no way can improve or preserve image quality. Moreover, native editing typically enables more streams of video to be edited in real time. NLEs obtain YCrCb frames from uncompressed source files, on-the-fly decompressed intraframe source files, and on-the-fly decoded HDV, XDCAM HD, XDCAM EX and XDCAM HD 422 source files. (As part of a decode, 4:2:0 MPEG-2 video is upsampled to 4:2:2.) Therefore, no matter the source format on disk, exports are always made from 4:2:2 uncompressed video.

When you play back a timeline, you are viewing 4:2:2 uncompressed video that is output directly via HD-SDI or HDMI, or converted to RGB. When effects are rendered in real time, the render engine outputs 4:2:2 uncompressed video. When you manually render effects, the render engine's uncompressed 4:2:2 output may either be sent directly to a file or compressed. Obviously, compressed files have the advantage of requiring far less space and typically do not require a RAID-based editing system.

For example, with HD MPEG-2 source video, Avid Xpress Pro HD and Media Composer can render effects to compressed DN×HD files. And, beginning with Apple Final Cut Pro 6, one can request that effects applied to MPEG-2 source video be rendered to ProRes 422 files. Moreover, you can request Apple's Color application to render to ProRes 422 files.

Therefore, clips with applied effects will not be re-encoded to MPEG-2. Likewise, graphics are never encoded to MPEG-2 during editing. The only time MPEG-2 source files will be re-encoded to MPEG-2, or any interframe format, is when you request an export to MPEG-2 or H.264 to create DVD or Blu-ray optical discs.