Virtual sets: Imagining we were there

Controlling reality is always expensive, and creating illusions is sometimes nearly as expensive.

Entire motion pictures are being created in computers, and in the future even actors may well be created digitally.

For over 150 years photographers have understood empirically the physics of imaging. Without the benefit of mathematics, an artist can compose a scene, place the subject in it in a pleasing way, and control the lighting, focal length of the lens, subject distance, depth of focus, and position of the camera. When photography became a moving image art form, the cinematographer also had to take the relative motion of the camera, the background and the subject into consideration.

Anyone who has budgeted for a production knows that staging for motion pictures or video cameras is an expensive proposition. Controlling reality always is, and creating illusions is sometimes nearly as expensive. In order to control the total effect, scenes are often staged in elaborate studio setups where lighting and other variables can be strictly controlled, allowing for matching scenes from multiple days of shooting without the variables of the natural environment.

There is now a way to similarly control the illusion of reality and create imaginative and realistic images. The increasing spohistication of computer graphics has fueled a new industry that crosses many genres. Visualization for science, entertainment, advertising and training has advanced rapidly in the last 20 years. Entire motion pictures are being created in computers, and in the future even actors may well be created digitally.

By combining natural images with computer images, it is possible to place actors into scenes with historical context, ones in which the actor's health might be jeopardized or that cost too much for even Hollywood to accept. However, there are complications involved in combining shots from real optical systems with those simulated in computers.

For such an approach to work it must be possible to model the geometry, lighting, surfaces and properties of the environment in the computer. Some common elements in staging are exceedingly complex to model in a computer. For instance, fog, wind, and other time-varying conditions can only be simulated in computer imagery. For the purposes of this review we need to restrict our imagination somewhat and look at a simpler and more achievable artificial reality.

For virtual sets to work, the set designer must have access to computer-modeling tools that can interface with the compositing applications. Most architectural and drafting packages can output the geometry in the right form. The computer graphics artists must then apply the surfaces specified and render the image for approval. The general process with movement involved is complex to understand all at once. A simpler case will point out the difficulty of analyzing the entire range of variables.

Imagine a scene in which the camera in the composited image is fully static first. In order for a natural image to be used as part of a composite, several factors must be matched to a fairly high degree of accuracy, including:

Angle of the camera, focal length and the position of focus;
Pan and tilt and roll angles of both cameras;
X, Y, Z position of the camera relative to the scene;
Lighting source(s) location, intensity, spectral characteristics, type of light (diffuse varying to point source); and
Colorimetry of the recording media (film or video colorimetry and computer image transfer functions).

It is easy to calculate the geometry and effects of focal length of the lens, as long as the lens does not produce distortion, as in a very wide-angle lens. Indeed, digital video effects use the same methods to create the illusion of a flat image plane moving in three dimensions with six degrees of freedom. To generate the parameters used to calculate a model from the perspective of a hypothetical camera that will match exactly the image of a camera shooting a natural image, we need to either constrain the real camera or measure the six degrees of freedom necessary to represent it.

This is done by attaching sensors to the pan head and lens and positional sensors to the tripod or studio dolly (or crane). Simple enough; we grab the data, feed it to algorithms that calculate the synthetic image and then composite the real camera with the computer image of the hypothetical camera using blue screen (chromakey) techniques. Voila! A completed image, and the first step in a virtual set. The scene geometry does not change, so we create a backdrop and into it we place the actor.

If the actor must move in a set where he/she moves behind the set pieces, the blue screen includes foreground objects that the real camera can use to block the actor's image just like the real set would, but at a much lower cost. As long as the background is static, you can compute the image once and use it for every subsequent frame.

If the camera and background move, the relative processing power must geometrically increase. Remember that first the sensors send information to the computer to tell it where the natural camera is, where it is pointing and the characteristics of the lens. Then the computer must calculate the synthetic image that matches those parameters and output the background plate. If the computer takes one video frame to calculate the image, we delay the real camera's output one frame and they will be in perfect synchronization. If the processing power changes as the scene becomes more complex, then the delay for the natural image must track with the latency of the computer-derived image.

Think of the complexity of a shot of a news set. The shot begins with a tight shot of the anchor and pulls out to reveal a complex set with monitors in the background. At the beginning of the scene the background detail is thrown out of focus by the depth of focus of the lens. Very little processing power is needed to calculate the image. At the end of the shot the image is considerably more complex. If the camera begins to move in three dimensions as it zooms out, you can see the daunting number of variables the computer must crunch.

Fortunately, the power of real-time graphics engines has increased rapidly and now can support modest real-time moves without the appearance of hysteresis between the foreground and background images. We may well approach the point in the future when a set designer never builds a model, but simply gives his design to a trained 3D draftsman who builds a model in the computer. A computer artist “decorates” the set, and a lighting person working with virtual reality tools places the lights to get the desired effect. At the end a printout gives the parameters for the real lights and camera positions to the studio crew.

This all works well as long as the process is real time, but what if we want to record the actor without tying him or her up while the technologists sort out the inevitable issues? All we need to do is store the data (metadata actually) and time stamp the data so it can be kept in sync with the camera image. SMPTE 315M-1999 sets standards for capturing and storing data conveyed by ancillary data packets at considerable resolution. That data is stored in a standardized way and transmitted in the ANC data sections of the transmission medium. The time and other metadata can be used to render the background image at any time without issues of processing time and latency. Later synchronization and compositing is thus possible.

Clearly it is also possible to drop a synthesized image into the foreground of a natural image, as was done in “Jurassic Park,” when the tyrannosaur showed up in studio shots. When such “non-geometric” objects are placed in natural scenes, the computing necessary will almost certainly be non-real time today. We can expect the future to make such processes more transparent.

At the end of the day, the process can provide significant benefits. If sets are nothing more than chromakey environments, it is possible to change from one program's set to another quite rapidly, though still not instantaneously. Studio costs and recurring costs for set storage are reduced. New artwork can be added to a set without significant preparation time. It is possible to experiment with new looks and lighting quickly and without using large crews to test new ideas. Some set elements can be rendered to a set automatically by creating an embedded object that links to the art. One could even argue that the depth of detail that HDTV can display is facilitated by using detailed computer models instead of complex and expensive studio sets. On the downside, that requires an intricate computer model and extremely accurate tracking between the foreground and background plates.

The reasons why must be balanced with reasons why not: Virtual set technology is not inexpensive set technology. When compared with regular changes to a major set, the reduction in capital cost may well be demonstrable, but a careful analysis must be done.

It must also be noted that the technology is complex and requires careful alignment to remain fully useful. Technical personnel skilled in lighting a real set are not of much help in lighting a virtual set in a computer. Set decorators who have considerable artistry may have to learn new skills and ways of communicating their art to remain useful members of the team. Plan on hiring a crack computer artist and programmer to get the most out of the technology.

Computing power is dropping in cost, and the future cost of such technology is likely to make the use of virtual sets commonplace in many types of usage and at varying price points.

John Luff is vice president of business development for AZCAR USA.