Does Video Need to Go Faster?

LOS ANGELES—We’ve all heard the 4K hype of Mo’ Better Pixels, but what about frame rate? We’ve lived in a world of low frame rates—24, 25, 30,—but with the dawn of digital TV, we’ve had the ability to capture, produce and distribute at higher frame rates, such as 50 and 60 Hz. Does TV need frame rates beyond 60 Hz? Isn’t 60 Hz the ‘fusion of vision’ point for most humans?

Before we can answer this question, let’s look at the implications of higher frame rates.

Video is a ‘sampled’ system in all dimensions: horizontal, vertical and temporal. There are discrete samples, separated by the sample clock period (1/sample frequency). Sampled systems are subject to the Nyquist sampling criteria, which states that the highest frequency that can be represented in a sampled system is one-half the sample frequency. So if we capture at 60 Hz, really the highest temporal frequency that can be reproduced without alias is 30 Hz. This is fine for most motion imagery, especially scenes with normal human and camera motion. However, we know that high-speed objects—cars, planes, Superman—will be blurry; likewise fast camera pans or zooms will not be in focus until the motion stops. Blur acts like a low-pass filter to remove the higher temporal frequencies and thus not disturb our senses with motion artifacts, like judder, stuttering. We know this when we capture video with a shorter exposure time than the full shutter time, and we see the effects of under-sampling the temporal motion.

24 fps vs. 120 fps Enter 120 Hz. At twice the sample rate, we can capture motion where 60 Hz is now the highest spatio-termporal frequency attainable. By oversampling, converting to lower frame rates can achieve better motion rendition. However, without either adding back ‘blur’ or some version of temporal filtering, artifacts such as juddering or stuttering will be apparent in the down-converted video. Thus, simple ‘frame-dropping’ in general will not create pleasing motion sequences at lower frame rates. But if a motion adaptive/predictive temporal down-sample filter is applied, one can control the level of blur vs. judder and keep still parts of the motion imagery in focus.

Another reason to consider higher frame rate/temporal sampling is the move to larger TV screens and closer (at one to two picture heights) viewing, resulting in the screen now occupying 60degrees of the viewer’s field of vision, as opposed to the HDTV 30-degree field of view. What is the effect of closer, larger screens? Think of an object—say a football—moving across the screen in two seconds. In HDTV viewing, with a 30-degree FoV at 60fps, the angular distance between each frame of video with regard to the object is 1/4 of a degree. For UHDTV viewing, with a 60-degree FoV, the angular distance between each frame of 60 Hz video is now further at 1/2 a degree. What is the problem with an increase in the angular distance? Human eye motion tracking has a finite tracking speed for smooth motion, which, if exceeded, causes the eye to have to back-track. This will effectively add blur to the moving image that is not present in the captured image.

However, if the frame rate is increased to 120 Hz, then the angular distance returns to the 1/4 degree, matching the same spatial-temporal rate of the HDTV viewing. Wider field of view, for the same motion sequence, requires a higher frame rate in order to maintain the same displacement per degree per frame and thus allow the eye to track motion without error.

Content today is distributed on a global basis on a multitude of distribution platforms and frame rates. Typical frame rates are 24, 25, 30, 50, and 60 (I’ll leave out the 23.98, 29.97 and 59.94 for another blog). Some have pointed out that if we produce at higher frame rates, couldn’t we convert to lower frame rates and thus have one master format from which all the other formats can be derived? Well, we do have such a frame rate, at least for scripted content: movies, TV dramas, sitcoms, etc. And it’s the lowest! Why does this work? Well anyone in Hollywood will tell you, it’s the seven-second rule! As long as the camera motion or in-scene motion does not exceed seven seconds—or 8.6 degree/second—then the blur/judder ratio will be about right. As we have already discussed, the higher frame rates can support higher temporal motion, so mapping low to high does not violate the Nyquist criteria; however, mapping 24fps film into 60 Hz has a unique artifact due to the 3:2 pull-down sequence, which creates beat frequency effects that look similar to judder.

But what about content that has faster temporal motion, less than seven frames per second? With live sports, news, and reality TV, it’s hard to limit either the camera or in-scene motion. Capturing at higher frame rates solves this problem and can provide the higher temporal sampling required of content with high spatial/temporal frequencies. But how then can the HFR video become a source for all other frame rates?

When I was a much younger engineer, I was asked by my boss to look at a tape of a European car race. He wanted my opinion of the video quality, especially the motion rendition. In spite of the best available standards converter in use at the time, there were obvious and not subtle artifacts. My comment was, “Well, I wouldn’t pay too much for that technology.” I was informed that this standards converter cost more than $500,000 (in late 1980’s dollars).

We understand why this was so hard; starting with a lower frame rate and converting to a higher frame rate requires very careful filtering to avoid judder. Further, the closeness of the two temporal sample rates—50 and 60 Hz—creates “beat frequency” issues (+ 10Hz). In addition, the limited availability and cost of video memory, temporal filters at that time had to be truncated, leading to additional sampling artifacts.

In the ’90s, standards converters improved with the adoption of phase correlated motion detection/processing, which was developed by the BBC and incorporated in the Snell and Wilcox Alchemist brand of standards converters. Improvements in this type of frame rate conversion continued on and today’s software based frame rate/temporal rate conversion techniques provide much improved video frame rate conversions. We have tested several of these software packages—IsoVideo, InSync, RE:Vision Effects, and Cinnafilm—to convert our 120 Hz video sequences to other frame rates including 24, 25, 50, 60 and even 100 Hz. The results are remarkable and visually artifact free; I was impressed when I watched some softball game clips captured at 120 Hz and converted to 24fps. While appearing film like, there were no annoying judder effects while the still parts of the image remained crystal clear.

Will 120fps—or higher—video be important in the future? I think the answer lies in the ease of converting from high frame rates to lower frame rates. The temporal conversion algorithms are ready and continue to improve but the needed infrastructure for HFR video—bandwidth and storage—has yet to catch up; FYI, uncompressed 4K at120fps video signals need approximately 24 Gbps and 10.9 PB/hr of storage. 4K HFR cameras exist today; flat panel displays are capable of high frame rate display, but due to internal processing limitations, they typically are limited to input frame rates of 60 Hz or less. There is much to learn about how to shoot at higher frame rates, not least of which are the greater lighting demands, such as light level and frequency of the light power source. But the early results are encouraging.

Jim DeFilippis
CEO of TMS Consulting, Inc.

im DeFilippis is CEO of TMS Consulting, Inc. in Los Angeles, and former EVP Digital Television Technologies and Standards for FOX.