Standards conversion

The original purpose of the standards converter was to address the incompatibilities between the world's multiple television standards. Dealing with standards featuring different frame and field rates, with different numbers of lines and fields in each frame, has long been an issue facing the video industry. Today, the problem is compounded by the need to convert material from interlaced to progressive with a mix of SD and HD formats, each with its own color space.

Conversion systems now also must account for the embedded audio carried along with virtually all TV signals. Before audio was embedded in the digital signal, it remained the domain of audio engineers. However, integrating audio with digital video signals has made it hard to separate the two without introducing lip sync errors. Consequently, format conversion must be considered both an audio and video issue.

Standards conversion

A high-quality format and standards conversion solution is valuable whether implemented in a facility that performs live event transmission or mastering and duplication. Within the live transmission chain, there is no time for manual intervention, and the standards converter must be able to provide a high-quality picture, regardless of the circumstances. Any errors introduced during conversion will find their way into the end product. The quality requirements of a system used in post-production and mastering/duplication facilities are high as well. Clients often want to send the finished program to multiple markets, which requires the facility to convert material to the appropriate standards, corresponding frame and line rate, scan, and so on. The quality of the standards converter affects the end product's quality and the project's cost-effectiveness.

A less expensive converter with many operational modes may be attractive. However, the time and labor spent fixing the end product may quickly erode any cost savings. Manual intervention and quality assurance are costly, so conversion must be performed right the first time. It is therefore essential that the converter have only one operational mode, i.e., “On.” A robust standards converter that provides highly automated operation and sophisticated image and audio processing is usually a better choice.

Image processing tech

When purchasing a standards or format converter, it's important to know what type of underlying image processing technology is being used. There are several approaches to deinterlacing and frame rate conversion.

Early conversion solutions relied on a linear filter, combined in some cases with motion adaption technology to try to adapt how the output image behaves based on the presence of motion. The most basic linear deinterlacing solution filters out the odd or the even fields, reducing the image resolution by half.

Adding a degree of sophistication to this process, motion adaptive techniques maintain both fields in those parts of the picture where there is no motion, but blend half the resolution in those parts of the picture with movement. Like linear techniques, this process leads to image softness, blurring and apparent motion judder.

Motion estimation provides an alternative method for converting interlaced video to progressive. There are three primary technologies used to estimate movement: block matching, gradient techniques and phase correlation.

Block matching and gradient techniques are the easiest to implement in hardware and software-based solutions. However, both share one detrimental side effect. Their performance is influenced negatively by changes in noise and luminance levels. Noise is present in all signals, and luminance changes are nearly as common as movement in TV programming.

One example of luminance level changes is a football kicked in late afternoon through sun and shadows across the field. Luminance levels change significantly as the ball transitions from sunlight to shade. Another example is models walking on a catwalk as flash photos are taken. These types of scenes can dramatically upset both block matching and gradient techniques.

Phase correlation

An alternative is phase correlation, which is based on the principle that the displacement in the time domain relates to the phase shift in the frequency domain. As a frequency domain technique, phase correlation is free from the effects of luminance and noise and thus offers good immunity to changes in both light and noise.

Implementing phase-correlated motion estimation requires sophisticated algorithms and a high degree of processing power. Within phase correlation, a Fourier transform breaks down the video into a series of sine waves. The phase for each of these sine waves is thus provided, and motion can be measured with the phase information available from successive images. Spectral analysis of two successive fields and subsequent subtraction of individual phase components yields phase differences that, when subjected to a reversed Fourier transform, provide a correlation surface with peaks corresponding to the motion between successive images.

By using multiple stages, it's possible to derive a motion vector to sub-pixel resolution. This level of accuracy provides for a system that is most able to recreate the movement and changes in a given scene.

During deinterlacing, motion compensation moves different parts of the picture to the correct point in time, recreating the missing lines from the interlaced scan so that each field is presented in full resolution. The same technique can be used to perform accurate frame-rate standards conversion, as both processes must be able to recreate the position of images at any point in time. When converting video from 50Hz to 60Hz, motion compensation can be used to replace 50 frames of video with 60 frames of video.

A common misconception is that 10 frames are either added or subtracted to achieve the correct frame rate. Actually, every frame in the converted output is synthesized from scratch. Because each frame-rate standard samples different points in time within the same one-second interval, phase-correlation technology measures the measure motion between two inputs that straddle the desired output field and then scales the motion vectors accordingly. As a result, an entirely new set of frames is generated accurately.

Processing challenges

In a motion-compensated standards converter, the interfield interpolation axis is not aligned at the time axis in the presence of motion. (See Figure 1.) In practice, the interpolation axis is skewed by using the motion vectors to shift parts of the source fields. The displacement is measured in pixels, and the value is divided into the integer part (the nearest whole number of pixels) and the fractional part (the subpixel shift). Pixels from input fields are stored in RAM, which the interpolator addresses to obtain input for filtering.

The integer part of the impulse response shift is added to the RAM address so that the pixels from the input field appear to have been shifted. The vertical shift changes the row address, and the horizontal shift changes the column address. Address mapping moves the image with pixel accuracy. This is followed by using the subpixel shift to control the phase of the interpolator. Combining address mapping and interpolation in this way allows image areas to be shifted by large distances with exceptional accuracy.

What makes the quality of motion estimation so critical is the behavior of outputs when the converter makes a mistake in creating an intermediate field. Say a camera pan causes a person to move from left to right; the motion compensation system must locate that person in a point in time never actually captured by the camera. If the technology moves the head and body differently, the human eye and brain will know at some level that something's not right. Thus a minute error in motion estimation, even for one output picture, is enough to create a significant disturbance for the viewer.

Fast and complex motion can be a challenge for conversion; however, speed can help mask the effects of poorly performed motion compensation. (See Figure 2 on page 34.)

Still picture management is also important. While it seems like an easier task, some technologies turn stills into a picture with moving bits. Even seemingly benign images can go to pieces, such as a still shot of a building exterior, with the windows in motion.

Moving roller credits are a particular challenge. It takes sophisticated motion estimation to account for the movement of small objects within a picture. Phase-correlated motion compensation can enable high-quality deinterlacing as well as precise, clean frame-rate conversion even for complex graphics, fast-motion sports, film and variable speed camera outputs.

Exceptional motion compensation is not just higher accuracy in creating pictures, but also how it makes mistakes. A mathematical byproduct of phase correlation is a reliability indicator that can tell the system when it is working effectively and when it must tread more carefully. This information provides a graceful fallback mechanism for concealing any errors the system inevitably will make.

Added image and audio

When motion-compensation standards conversion became available, phase correlation was so effective that often the only residual artifact for the viewer or clue for the downstream broadcaster was that cuts were no longer clean. Since then, different technologies have been developed to ensure clean video transitions between scenes and programs. Some solutions allow operators to choose the field dominance of the converter output. Prior to this, converters scrambled field dominance whether or not it was correct on the source.

The problem with fluctuating field dominance is that it becomes difficult to edit programs. It is also an issue with international program exchange. Post-conversion master editing is tricky when the field dominance isn't consistent.

Because most content is compressed somewhere within the delivery chain — whether on DVD or over a broadcast media — fluctuating field dominance is also a problem here. The issue makes it difficult for a downstream compression system to insert a single clean I-frame. The efficiency of both workflows — and the quality of the end product — can be compromised when standards conversion doesn't provide clean transitions.

Consider pixel accuracy

If content is shot using an interlaced camera and the content contains interlaced content on a field-by-field basis, the only transparent way to convert between formats is to use motion estimation to measure the movement between fields and compensate for the effects of the movement between them.

The first step is to use motion estimation to nullify the effects of any movement and to make sure that pixels within the input frame are aligned in time. Within this deinterlacing process, it is the motion estimator's job to deliver motion information that can be used to near-perfectly compensate for movement.

Another issue accompanying modern standards conversion is aspect ratio. The transition to a 16:9 aspect ratio is still relatively new to most U.S. broadcasters. This means production companies and broadcasters must deal with at least two aspect ratios. Fortunately, every standards converter made today includes built-in aspect ratio conversion with various preset and user-definable modes.

The audio side

Dealing effectively with audio also has become the standards converter's responsibility. Standards converters should accommodate 16 channels of audio, or eight AES pairs, and be able to resample audio and perform sample rate conversion from the input rate to the output rate. The Dolby audio standard is used extensively throughout the broadcast industry for multichannel surround. This means handling up to 16 channels filled with Dolby E, discrete 5.1 audio, an additional stereo mix and perhaps a second language or soundtrack information.

Dolby E brings its own requirements to conversion because it is locked to the incoming frame rate of video and must be decoded, recoded and relocked to ensure that the audio can be re-edited downstream without corrupting the Dolby E signal.

Product development

Investing in standards conversion technology is an important decision in the life of a broadcast, post or duplication facility. Advanced solutions today include comprehensive film tools for 23p, 24p, 25p, 30p and sF formats, along with 3Gb/s capabilities to handle 1080p. Standards conversion platforms are capable of operating within both the hardware or software domain.

Standards conversion issues do not disappear in the file-based workflow. Content must remain at the correct frame rate through the entire workflow right to the end of the consumption chain.

David Tasker is head of technical sales for Snell & Wilcox.