Wavelet compression

Ever since the necessity for international exchange of TV programs in electronic form rather than on film, there has been a need for standards conversion. Over the last half century, equipment has seen much research and development effort to produce devices with improving transparency. Early devices used optical techniques, sometimes with remarkably good results for the day. In the late 1960s, huge analog frame store devices were developed using ultrasonic glass delay lines as the storage elements.

It was not until the application of digital signal processing that standards conversion equipment started to approach anything close to the levels of transparency demanded by the broadcasters — but initially at very significant cost.

As digital TV production, along with the introduction of HDTV services, started in the 1990s, the requirement for more and even better standards converters became paramount to address broadcasters' needs for better quality at more affordable prices.

The two key factors in choosing rate and format conversion equipment for the broadcast industry today are consistently high picture quality across a wide variety of content types as well as affordability.

A small number of manufacturers currently produce devices that will deliver high picture quality, but the price tag is not attractive enough for them to be purchased in large numbers. And even these expensive devices can exhibit subjectively annoying artifacts on certain types of material, particularly material involving lots of detail and movement at critical velocities.

Notwithstanding the performance limitations, price is an increasing issue as the demand for format converters is growing rapidly. In particular, there is a fundamental need to use SD originated material in an HD TV environment. This requires an upconverter, which will be required in substantial numbers as the world migrates to HDTV over the next few years.

Due mainly to the interlaced structure of SD television, upconverted pictures can often look worse on an HD display than the original on an SD display. This is despite the development of some complex and expensive technologies over the last couple of decades.

A substantial change in expectations on performance and cost has been enabled by a new process referred to as Super-Resolution Bandlet Technology. This has demonstrated an improvement over current technologies while being implemented in a single medium-cost Field-Programmable Gate Array (FPGA).

Super-Resolution Bandlet Technology is a set of mathematical image processing algorithms based on geometric wavelet theory and used for deinterlacing and upconversion in this application. The algorithms have been developed by a mathematics team at Let it Wave, led by Stéphane Mallat, professor of applied mathematics at Ecole Polytechnique in Paris.

How does it work?

To illustrate the benefits of the new technology, let's compare it with current technologies — deinterlacing and scaling. In the 1980s and 1990s, the first challenge of deinterlacing was real-time digital video processing at a minimum cost. Minimizing the number of operations and memory requirements was thus necessary, which first led to two simple methods: spatial line doubling or time weaving.

As more processing became available at a lower cost, these basic techniques were replaced by linear spatial interpolations and linear time interpolations. Interpolating such signals produces all types of aliasing artifacts, including jaggies and Moiré effects on periodic patterns. Comb artifacts might also remain as shown in the lower middle portion of Figure 1 on page 16 and Figure 2. This is due to an aliasing phenomena in space for sharp structures and in time for fast motions.

Time interpolations give good results when there is little or no motion, even along sharp spatial transitions. Spatial interpolations give good results in smooth spatial regions, even when there is fast motion. A natural improvement of these techniques is to mix them to get the best of both, which corresponds to motion adaptive algorithms. The improvement and typical remaining artifacts are shown in the upper right portions of Figures 1 and 2.

Up to this point, innovations came from hardware architecture allowing more computational power rather than creative new algorithmic ideas. To reduce artifacts of spatial interpolations, edge adaptive interpolations locally adapt spatial interpolations to the directions of local image structures where used.

Such adaptive directional interpolators reduce the artifacts introduced by fixed spatial interpolators but not completely because not enough data is available in a single field to perform a precise directional interpolation.

As a result, deinterlaced video can have a time flicker (oscillatory artifacts) when the directional interpolations performed on even and odd fields do not give coherent information. However, this improvement is at the core of nearly all motion adaptive deinterlacing procedures used by the television industry.

Motion compensated deinterlacing

Motion adaptive techniques are not sophisticated enough when sharp image structures move. The industry and image processing research community largely viewed motion compensation as the next-generation technology. The idea is to first compute the motion of each pixel and then, for missing pixels, to perform the interpolation in time in a direction that follows the time displacements. The interpolation is thus performed in a time direction that compensates for the motion.

Developing a robust motion compensation deinterlacing technology has been the Holy Grail of the video conversion industry during the last 10 years. Yet after all this time and effort, it could be argued that no motion compensation algorithm product is available that meets the expected performance and cost levels, particularly when compared with motion adaptive techniques.

Motion compensation is based on the assumptions that a single motion can be associated to each image pixel, and that this motion, also called optical flow, can be computed reliably. Optical flow research has shown that these unicity and robustness assumptions are incorrect. First, one cannot always associate a single motion to a pixel (transparencies). Second, optical flow estimation is intrinsically unreliable and is known as an ill-posed problem. It does not mean that motion can never be measured accurately, but rather that it cannot always be measured accurately.

As a consequence, motion compensation algorithms measure the reliability of their motion calculation and when not sufficiently reliable, a spatial interpolation is performed. Obtaining robust results requires being conservative and hence performing many spatial interpolations. It thus often leads to the same type of artifacts as motion adaptive algorithms, with many more operations and associated high cost.

Super-Resolution Bandlet Technology with total variation

Motion adaptive algorithms have a limited performance because a missing pixel is computed from a small set of available pixels. Motion compensated algorithms have the advantage of using information in a full three-dimensional space-time neighbourhood but are less robust because of motion estimation errors.

This new approach is based on a full exploration of the 3-D environment of a missing pixel to compute it. Instead of making assumptions on the input, these missing image pixels can be computed by looking at the quality of the resulting images given a specific choice.

A directional interpolation computes a missing pixel by interpolating one or more available pixels located close to a line going through the missing pixel with a particular spatio-temporal direction, as illustrated in Figure 3.

This shows that the super-resolution framework includes all previously used techniques. An order zero interpolation is a simple copy of one pixel located before or after the missing pixel along the spatio-temporal direction. Line doubling and time weaving algorithms are such copies respectively along the vertical spatial direction and along the time direction. A first- order interpolation may be computed between two pixels along the specified spatio-temporal direction, or a higher order polynomial interpolation may use more than two pixels along this direction. Different interpolations produce different estimations.

The set of all possible estimations is therefore the result of a choice of a family of spatio-temporal directions and a set of different interpolations along these directions.

A criterion has to be found to choose one estimator. This criterion has to be comparative, and one uses a mathematical norm on the video signal. A total variation norm measures the amplitude of oscillations and variations of the image (jaggies and flickers are spatio-temporal oscillatory artifacts), without penalizing discontinuities. The total variation of an image is essentially proportional to the total length of edges, whether these edges are discontinuous or smooth transitions.

When the search for a best directional interpolation is limited to a spatio-temporal neighborhood that is too small, the best estimator may not be ideal. A spatio-temporal bandlet thresholding regularizes the estimation with a spatio-temporal geometric flow that is derived from the directions of the computed best interpolators.

Per pixel cadence

A film frame is divided into two or three fields corresponding to even or odd rows. The deinterlacing should replace each field by the corresponding full frame. This requires copying appropriate pixels.

The deinterlacing of films in video thus requires identifying an appropriate space-time interpolation. From the algorithm's point of view, this is no different from deinterlacing any other interlaced video content.

The result is that the upconversion process is not compromised by mixed cadence sequences. This is important in real-world content with multimedia contributions.

User reaction

Thanks to some pioneering work by academics from outside the broadcast industry, a new generation of format conversion products is emerging, which breaks the established quality/cost barriers to the benefit of broadcasters and viewers alike.

Recent field trials and customer evaluations have indicated that this affordable and practical new technology will indeed raise the expectations for quality and consistency in upconversion.

The author would like to thank Steve Sherlock from Brick House Video Ltd and the engineering team at Let It Wave, Paris, for their assistance with this article.

Ted Taylor is a director of the Media Asset Capital partnership.