A Standards Converter

The original temporal field rates for television standards were established back in the 1930s when the state-of-the-art oscillator and power regulation circuits were based on the frequency of AC power. Further diversification in the number of scanning lines and color-coding schemes left us with the NTSC 59.94 Hz, PAL and SECAM 50 Hz standards. The additional diversification introduced by the ATSC has produced a total of nearly 30 different formats and standards using both progressive and interlaced scanning structures.

With the truly international nature of the content market, there is potential for severe confusion in the standards conversion arena.

The predominant HD formats for broadcast use are 720 lines for progressive and 1080 lines for interlaced pictures at either 59.94 or 50 Hz temporal sampling rates. It is possible through all this, however, to see a path back to some semblance of order.

Already, the move to digital has allowed us to dispense with the NTSC, PAL and SECAM color-coding schemes—and if processing speeds continue to improve, we may ultimately see the new 1080-line progressive format emerge as the unifying “Holy Grail” format. What is unlikely to be forthcoming in the foreseeable future is the emergence of a single temporal or picture frame rate to accompany this format.

In the past, the term “standards conversion” was exclusively applied only when temporal sampling rate conversion was part of the process. Conversion between “same temporal sampling” systems was referred to as transcoding.

This practice should be maintained when encompassing the new HD systems. Unfortunately, it now appears commonplace to describe a device that can convert 576i SD to 1080i HD at the same field/frame rate as a standards converter. The correct term for this process is upconversion (and the reverse process is downconversion). However, even if we find the Holy Grail, we can be sure that the international nature of television will mean that standards converters (in the true sense) will be part of the furniture for a long time.

ROI ASAP
Now that Europe appears to have fully embraced the move to HDTV, a common question is why the availability and choice of high quality HD standards converters is currently so limited. There are two main explanations for this situation.

First, the significant cost in design and development of such devices necessitates a fairly immediate return on investment. Although this is true of most major project developments, initial sales in this case would be heavily dependent on adoption of HDTV by countries with broadcast transmission standard(s) that use 50 Hz temporal sampling rates. Without this step, countries such as the United States—with an established HD infrastructure operating at 59.94 Hz temporal rate—could import material by simply using existing SD standards converters to implement the frame rate conversion and then upconvert the output to HD.

Second, there has always been an underlying nervousness among the traditional standards converter manufacturers regarding emerging picture processing techniques for use in standard computer-based hardware platforms. A considerable slice of the standards converter market has been in the post-production environment, where realtime picture processing is not necessarily a prerequisite. Would a large slice of this traditional market disappear to software products for installation on powerful off-the-shelf PCs?

The truth is that from both a quality and economic perspective these concerns have proven to be premature and will most likely remain so for some years to come. The major bulk of theoretical work has been and continues to be focused on developing new and improved picture compression algorithms, which are significantly different from those required for full resolution baseband standards conversion.

Standards conversion is a more demanding application than data compression. In addition, even if the quality of non-realtime conversion could produce satisfactory performance, there are still operating costs to be considered. Non-realtime might better be described as “increased time,” which means increased cost.

Hourly rates for high performance frame rate standards conversion have never been cheap. The business case for the design and development of a realtime HD standards converter on a dedicated hardware platform are as strong as ever, if not stronger.

DEFINING CONVERSION
The primary function of a standards converter is to create a new stream of picture frames from an existing one but at a different rate defined by the output standard. Each new output frame is displaced by a varying offset time dT from the previous adjacent input frame, where T is the frame repetition period in the original stream.

Picture elements from the same relative positions in adjacent input frames are in fact temporal samples and can be treated as such for use in standard sampling theory. It would seem sensible, therefore, to create each new output frame by simply interpolating these samples with the relative position of the new samples being defined by dT1, dT2, etc.

The problem is that the temporal sampling rate in all television standards is not high enough to accurately depict anything other than very slow moving objects. Moving objects will be in a different location on each successive input frame.

Simple interpolation or averaging between four frames, for instance, will produce four images of the object in the output frame(s). Fast moving objects will appear to judder and blur. This is otherwise referred to as temporal aliasing. It’s not a problem when viewing the original native input because the eye is able to track moving objects making them stationary relative to the retina and the temporal aliases are not seen. When the input signal passes through a simple linear standards converter, the temporal aliasing causes errors in the interpolation process.

What is required is a method of modifying the operation of the converter to track the course of a moving object in the same way the eye does. To do this, the direction and speed of each region of movement needs to be determined.

Each region or group of pixels with the same movement is then allocated a vector. The final result is a “vector frame” for each proposed new output picture frame. The individual vectors in the frame are essentially pointers to tell the converter which part of its memory to address in order to retrieve the correct samples for creation of that part of the new output frame.

A primary vector frame is created by the motion estimator part of the converter by comparison and analysis of adjacent input frames. This is then scaled by a factor proportional to T to create the vector frame associated with the new output frame.

IT’S NOT THAT SIMPLE
Initially, this approach appears straightforward—until the inquisitive mind takes over. How does the motion estimator ensure that other moving objects in the same scene don’t confuse it? What happens with concealed and revealed parts of the picture when an object has moved?

Techniques used in motion estimators come in various guises including hierarchical spatial correlation (sometimes referred to as block matching) and phase correlation. Both methods have their pros and cons and deal with these issues in different ways. The truth is that such technologies have become so refined over the last few years that you have to concentrate very hard to spot the motion artifacts. At the end of the day, it’s not the technique employed that matters, but the quality of the pictures.

In terms of extending these techniques to manipulate HD in addition to SD signals, the vastly increased speed in the latest data processing components permits additional computations (in a given time) beyond that to just cater for the inherent increased resolution. The result is even higher resolution and greater accuracy of the motion vectors.

Changes in the architecture of the converter itself provide other enhancements. In SD converters, both the input and output standards were interlaced scanning formats and the filtering and estimation processes were applied across four fields of the input and output signals. The very high performance of adaptive de-interlacing algorithms these days means that this process can occur prior to the motion estimation and temporal rate conversion processes.

This means that a 1080i input signal is first upconverted to 1080p and subsequently downconverted if the output signal required is an interlaced format. All the temporal rate processing is implemented on progressive scanned picture frames. Could migration to 1080p format be simply a matter of removing a bit of de-interlacing software?

When it comes to realtime standards conversion in the modern HD/SD 50/60 Hz world, the proof is in the picture. The complexity of conversions required means that there are simply very few companies with the theoretical technology, let alone the products, to do the job.

Kim Francis is a product specialist for Pro-Bel.