Frame-rate conversion

Frame-rate conversion is necessary almost always. It is needed in cinema when projecting 24fps material at 48fps or 72fps. In post-production, it comes into play when combining material shot at different frame rates or changing speeds for effect. It's used to reduce flicker for high frame-rate displays. And, it is useful when sending to different countries and display devices.

The convention is that motion imaging has a native frame rate — the number of images created over a particular period of time. When a ramp-up or ramp-down is done during creation, the native frame rate must be specified as it is no longer constant throughout the piece. Also, time-lapse or high-speed filming is done with a planned presentation frame rate different from the capture frame rate. But, how do we know what the “native” frame rate of a piece of video is supposed to be?

The simplest frame-rate conversion presents the original frames at a different rate than at which they were shot. Time-lapse footage shot at one frame per minute and played back at 30fps or 25fps for TV is an example. The most common example is when moving from film at 24fps to PAL at 25fps. One simply plays back at 25fps and shifts the audio pitch. The duration is reduced by 1/24 every second, but we cannot see this. “Gone with the Wind” is 238 minutes long. But, the same version in Europe is 228.5 minutes, leading many people to think 10 minutes were cut!

The television frame rate in the Americas is 30fps. Speeding up from film would be visible; therefore, 2:3 pulldown was invented. The process inserts additional frames so that the duration is unchanged. In most theatres, film is projected at 72fps via a shutter showing the same frame three times before the next is moved into place. When the U.S. TV frame rate was standardized in 1941, the best that could be economically achieved was 30fps interlaced at 2:1. This provided a decent picture in a living room environment with minimal motion artifacts. By the 1930s, the UK had already standardized on 25/50, so moving between the frame rates was required. Today, it's possible to synthesize frames based on adjacent frames' information. At first, this was done by frame blending, which worked for broadcast TV but caused visible artifacts in still frame. Now, modern digital abilities allow better missing frame reconstruction accuracy.

The most complicated frame rate conversion I've worked with was a recent 3-D shoot of a classical pianist. Originally shot for European TV at 1920 × 50i, our job was to meet 3-D Blu-ray needs. Specification limitations require a 23.976 source due to the increased data rate required for simultaneous playback of two streams of 1920 × 1080 video. Pitch shifting the audio was not an option. Dropping video frames resulted in visible judder, unacceptable for the prestigious project. New frames had to be synthesized, and in sync for both the left and right eyes.

If duplication or deletion of frames is not an option, new frames need to be synthesized. How these frames (or fields) are generated is a trade-off between time, money and accuracy. No matter the method (blending or interpolation), the number of frames needed is a primary factor. The smaller the frame-rate change is, the more new frames will be required per second. For example, going from 24fps to 25fps allows the reuse of only 1fps, whereas going from 24fps to 36fps allows for the reuse of three frames per second. This is only the case if frames have to stay in the same relationship. If the difference between adjacent frames is minimal, then more frames may be reused. This is a trade-off between spatial accuracy (reusing the original frame) and temporal accuracy (objects in the new frame are where they are supposed to be in time).

The simplest method for frame syntheses is blending. This can actually be done in the analog domain by using a delay with multiple taps equalling the number of frames you wish to blend. Digital frame memories have made this the basis for the low-cost standard converters available today. Add more memory and some processing to compare frames so original frames can be as often as possible. Results can be quite good. When the blending algorithm uses different percentages from adjacent frames, this is called adaptive motion interpolation.

The next step is to cut and paste from adjacent frames only changed portions. This is done by dividing each picture into a number of blocks and estimating change location in the new frame.

Algorithms that do this are a research topic at major universities worldwide. Commercial and noncommercial implementations of these algorithms are available. ASIC implementations are built into high frame-rate TVs under names like Motionflow, Motion Picture Pro, Perfect Pixel or HyperReal Engine. If render time isn't a criteria, then excellent results can be achieved using algorithm implementation software.

The best method of frame-rate conversion uses phase correlation to estimate motion, thus accurately defining missing pixels. By translating the picture into the frequency domain using FFT, it is possible to eliminate irrelevant information from the motion vector calculation. Manufacturers of real-time motion vector compensated frame-rate converters for production applications can be counted on one hand, and each of these excels when converting different material.

Underlying algorithms for frame-rate conversion are the same as those used for video compression. It is hoped that advancements in video compression, and the very real possibility of applying these advancements to frame syntheses, will result in more choices. More choices and falling prices will make for better programs as creative options provided by frame syntheses in post become available to a larger community of users.

Choosing workflow codecs

To avoid undue picture-quality deterioration, there are some rules to follow when designing a production workflow. The first rule is not to upconvert the resolution. Shoot, edit and archive in the highest-resolution format that will be used for distribution. The second rule is not to re-encode to a higher data rate; it is a waste of bandwidth and storage. The picture quality is limited by the lowest data-rate encode, and it is not going to improve.

Also, don't use interlace if the output is progressive, and don't upsample the color sampling (4:2:0 to 4:2:2). From these rules comes a pattern: The data rate decreases through the workflow. The only place that rule is broken is in the camcorder, where the signal is most likely more compressed than in the edit.

A broadcaster will have different demands from various departments. News will acquire, edit and archive in different formats than will entertainment. Things were simpler in the SD world. You could shoot, edit and deliver in one format, Digital Betacam or DVCPRO.

In the multiresolution world of HD-to-mobile, it is just not possible to use one format throughout the process chain. Rewrapping and transcoding have become an essential part of the video workflow. How and where this takes place in the broadcast chain must be carefully considered in order to avoid picture quality loss, yet still reduce the cost of storage and video networks. Overall, there is no single, best answer to picking a format, and the optimum choice is constantly changing as technology advances.

For more information, check out the “Video format conversion” article by David Austerberry.

Christopher Walker is consulting engineer for SONY DADC Austria.