3-D standards conversion

Audiences worldwide are being introduced to 3-D content as an increasing number of new movie releases are made available in 3-D, and as live events — ranging from sports events to ballet and opera — are shown in 3-D in an expanding network of digital cinemas. The 3-D market is being further accelerated by display screen manufacturers looking to benefit from the opportunities for home display replacements, as well as 3-D games and Blu-ray Discs that provide 3-D content. In this environment, TV broadcasters are starting to bring 3-D TV services to the home. While SD widescreen, HD and surround sound have paved the way toward a more immersive viewing experience, 3-D appears to have the potential to provide a “just-like-being-there” experience.

As with the first HD broadcast, early 3-D broadcasts are focusing on premium content such as live sporting and entertainment events. Most broadcasters pay for premium content from international sources, pulling in feeds from major live events such as the Olympic Games or the FIFA World Cup. They are also purchasing studio material, such as 3-D movies originally produced for cinema screenings and programming from 3-D TV channels.

Global distribution of high-profile live events requires standards conversion, from one frame rate to another, regardless of whether that content is 2-D or 3-D. As a result, broadcasters working with high-interest, high-value 3-D content must be prepared to incorporate standards conversion in their 3-D workflow.

Quality conversion is particularly critical in that the success of 3-D depends on how effectively and continuously content can cast its spell over the viewer. Getting the pictures “right” really matters. The processing applied at each stage of capture and production must be of a sufficiently high-quality level so viewers are not jolted out of their immersive experience. In terms of standards conversion, quality is defined primarily by the quality of the motion measurement available; more accurate and coherent vector fields lead to better pictures.

3-D conversion challenges

The 3-D signal consists of left- and right-eye channels that have been captured by two cameras arranged to simulate the different perspectives of a viewer's left and right eyes.

Figure 1 demonstrates the differences between the left- and right-eye views, which must be precisely replicated in any conversion process. The differences are often obvious at the left and right edges of the picture. You can easily demonstrate the perspective difference to yourself by holding a finger up 6in in front of your face, and then closing alternate eyes. You will either see the finger or the background move, depending on what you focus on.

These subtle variations replicate the human eyes' experience when looking at real-world content. Images are captured with exact synchronization between the two views, and when they are presented on a 3-D screen to a viewer wearing polarized or active-shutter glasses, the viewer experiences a sensation of depth.

Well-produced 3-D is visually pleasing as long as precise alignment of the left- and right-eye images is maintained in any downstream processing, including during standards conversion. Differences in vertical alignment, color or resolution between the two streams can be jarring, and any horizontal disparity can alter the depth perspective and lead to discomfort and unwanted effects.

Any 3-D material requiring standards conversion should be processed through equipment that applies identical processing to left- and right-eye images. On the surface, this task may seem easy: Put the left and right streams through the same processing chain, and the job is done. That's often true, but synchronization issues — the temporal offset of left- and right-eye images — and the introduction of artifacts during complex processing such as standards conversion can threaten both the integrity and quality of the end result.

Conversion without compromise

The intraocular positional difference, or the means by which stereo is created, creates different revealed and concealed areas in the left- and right-eye views. The fact that the two images differ from one another can cause standards converters to treat the two differently.

Some types of motion-compensation techniques used in standards conversion identify areas of similarity from frame to frame and assume that these areas of similarity are the same area (or object) simply moved between frames. This approach works for the purposes of compression because it isn't really important whether the areas are the same area, just that they look similar. In processes such as deinterlacing and standards conversion, however, these vectors do need to be accurate because they are used to move pixels to make entirely “new” pictures.

In the case of processing two similar, but not identical (left- and right-eye), images for 3-D standards, motion-compensation technology using phase correlation designed to establish motion vectors for each image is more likely to render similar matches that look and feel natural. Temporal equivalents of the methods that provide this coherent spatial vector set can then be employed to ensure that concealed and revealed areas still benefit from motion-compensated processing.

In taking on challenging material, such as repetitive structures or small, fast-moving objects like scrolling credits or a putt on a golf green, this latter approach to motion compensation succeeds because it can match pixels in each field precisely with their counterparts in adjacent fields. The resulting images yield the smooth motion that is essential to high-quality standards conversion.

Phase correlation performs a spectral analysis on the two successive fields (frames) by using a fast Fourier transform (FFT) and passes the output through an inverse FFT to generate a correlation surface. This surface uses the difference in phase information from field to field to reveal the pixels whose positions correspond to detected motion between the fields. Thus, the distance and direction of the motion are measured accurately, and application of a subsequent object-matching process creates the actual vector field for each pixel.

Because the phase-correlation system is insensitive to the effects of variations in noise and lighting, it performs well even when handling fades, objects moving in and out of the shade, and flashes of light, all of which are typical of the live events being broadcast in 3-D. The technique creates smooth vector sets that are coherent in all three axes, including spatial and temporal. The accurate, coherent, real motion-matching nature of these vectors leads to repeatable, smooth standards conversion and consistently high performance for both stereo channels.

Choosing a conversion model

For a live, high-value 3-D broadcast, such as a major worldwide sports event, the broadcaster will often use two standards conversion systems: one for the left-eye and one for the right-eye stream.

In this case, two systems running simultaneously can be genlocked to the same reference. With two real-time standards converters using true motion vectors to make independent processing decisions, the operator gains left- and right-eye images with high quality and consistency that are unlikely to cause issues for the viewer. (See Figure 2 on page 16.)

In fact, the compression used to deliver the resulting content to the home is much more likely to disrupt the 3-D experience. As with 2-D standards conversion, the cleaner the output of 3-D standards conversion, the more efficient — within limits — compression can be. The quality of the pictures viewed at the home display will depend largely on the bandwidth used to compress the pictures at the contribution and transmission stages.

Another common 3-D standards conversion workflow, and one compatible with current transmission bandwidths, relies on a single-channel 3-D distribution path, where the left- and right-eye images are subsampled before being combined into a single stream for compression and distribution. The single channel can be created one of two ways: The two images are horizontally subsampled and placed side by side into a frame, vertically subsampled for a top-over-bottom picture configuration, subsampled in both axes and interleaved in a checkerboard pattern; or the images are temporally subsampled in a time-multiplexed approach in which odd fields carry the left-eye image and even fields carry the right-eye signal.

When a broadcaster receives a 3-D contribution feed that's already in the side-by-side format, the standards conversion occurs toward the tail end of the overall processing workflow.

In this case, the combined process prior to conversion can introduce unnecessary bandwidth limitations through horizontal subsampling, and this reduction in quality can have an impact on the end result. Nevertheless, the standards converter can still yield cleanly converted, artifact-free images. Tightly controlled phase correlation ensures that the side-by-side formatting of two streams is no less accurate than for a single left-eye or right-eye stream. (See Figure 3.)

Finally, for 3-D broadcasts that aren't being delivered live, broadcasters can use a single standards converter and run each set of images through one after the other. In the performance of independent, sequential passes, the underlying quality of vector choices and the standards converter's ability to pick up real motion again ensure that results will be accurate and consistent.

If the broadcaster has a choice, the two-channel approach to 3-D standards conversion is the best solution. This approach offers the best possible quality in the end result because the full available bandwidth of the left- and right-eye images is preserved throughout the workflow. As a result, the broadcaster retains the value of its investment in 3-D content and is better positioned to offer the rich, engaging 3-D product that is essential to viewer satisfaction and, ultimately, fundamental to the success of 3-D services.

Gerard Phillips is engineering group manager of R&D, conversion and restoration for Snell.