Calculating depth of field

The development of HD video cameras has led to a move toward electronic cinematography. The 24p scanning format contributes to the film look, but one of the classic characteristics when shooting 35mm is small depth of field.

It's tempting to say that there's more to imaging systems than meets the eye, but it wouldn't be true. It is only what meets the eye that matters. So to explain almost anything about TV and film imaging systems, we have to start with the eye.

Depth of field is a classic example. We all know what it means, what it looks like and how to manipulate it for visual effect. But how to precisely predict it is beyond most of us. By making a few assumptions and measurements, it is relatively easy to calculate for any given situation.

First, the definition: Depth of field is the range of object distances, from the lens, within which everything appears to be in focus in the image. On the other side of the lens is depth of focus, which is the tolerance range of distances for the image sensor, from the lens, within which the focused object appears to be sharp. Clearly the depth of field is always much greater than the depth of focus, because the image is (nearly) always a great deal smaller than the object.

The ray diagram

In a ray diagram of a simple imaging lens with a focal length f, v and u are the distances from the lens plane to the image and object planes, respectively. (See Figure 1.) The sagittal (like an arrow) ray travels straight through the center of the lens without bending. All other ray paths are bent by the lens. Rays that are parallel to the normal through the center of the lens (horizontal in this example) are bent to travel through the focal point on the other side, and the focused image is formed where the rays from points on the object converge.

This principle is the basis of all the maths here. It is the divergence of rays from the object and convergence at the image that defines the depth of field. But first we need another concept.

Disk of confusion

The disk or circle of confusion is generally used to quantify the size of the smallest object that an optical system can produce in the image. Ideally, this would be a point of zero size, but lens manufacturing is difficult. For example, Zeiss prime lenses are designed to produce a disk of confusion of about 4µm. However, the camera has a disk of confusion as well — the sensor pixel spacing. This is about 5µm for 2/3in HD cameras and about 7µm for consumer 1/3in SD cameras. But in imaging systems, we should be more concerned with the disk of confusion of the entire system, and that must include the viewer.

A typical figure for a viewer with normal eyesight is about one minute of arc (1/60 of a degree). So any object that subtends an angle of less than one minute of arc at the eye appears to be a point in the scene. There is no need for the imaging system to reproduce anything smaller than that; we can't see it. We can use angles so that distances aren't involved and radians rather than degrees so that some approximations work. There are 2П radians in a full circle, so one radian is about 57 degrees, and one minute of arc is about 1/3420 radians.

You can measure your own eyesight very simply by holding up a ruler marked in millimeters and gradually moving away from it until the millimeter markings merge. A transparent plastic ruler is good for this, and it needs to be well lit so that the iris closes up. At the distance where the millimeter markings just merge, measure the distance from your eye (D) in millimeters. The size of your disk of confusion is then d=tan(1/2D). This assumes that the millimeter marks are 0.5mm wide. Since the angle is small, use the approximation d=1/2D. I get a distance of about 2.4m, so d is 1/4800 radian — about 0.7 minutes of arc. And that's all the personal information we need to calculate my depth of field, but we still need another concept.

Hyperfocal distance

This is the close-focus distance beyond which all points in the object field are in focus up to but not beyond infinity. It's an interesting concept. It means that I don't have to set the focus distance to infinity in order for distant objects to be sharp. I can maximize the range of sharpness by using a lens focus setting somewhat less than infinity, such that infinity is only just in focus. What this means is that all points in this focus range are reproduced with the size of the disk of confusion in the image plane. A ray diagram will help explain how this works. (See Figure 2.)

Rays from the near point, at the hyperfocal distance, form an inverted image where the rays from the object “h” cross. Rays from another object “f”, placed at infinity, are all parallel as they meet the lens and form an inverted image, “f” at the focal point of the lens. Both images cannot be on the sensor plane at the same time, so the sensor is placed between these images such that the converging (or diverging) rays form disks of equal diameter, the size of the disk of confusion — the equal misery point. And now we can derive depth of field.

Depth of field

Let's assume that the camera is correctly focused on an object at distance u from the lens, and points at this distance subtend a field angle Ø at the lens. (See Figure 3 on page 14.) The lens aperture (diameter) is a, and by trigonometry Ø/2=tan(a/2u). Because the angles are still small, we can approximate to Ø=a/u.

At distances un and uf (the near and far planes, extremes of the depth of field that we're calculating), points subtend field angles Øn and Øf, (in radians) at the lens. They produce disks of diameter d in the image, the size of the lens's disk of confusion, because the converging or diverging rays do not intersect the sensor plane at the respective points of focus. By the same trigonometry, we can state that Øn=a/un, and Øf=a/uf.

Another advantage of calculating the angles in radians is that we can add and subtract them without bothering with trigonometry and make the bold statement that Øn=Ø+d, and Øf=Ø-d. Now, we can rewrite that as Øn-d=Ø=Øf+d and substitute the angle formulae to get a/un-d=a/u=a/uf+ d. A little jggling gets un=a/(a/u+d) and uf=a/(a/u-d) and, after more substitution, un=a/(d*(n+1)) and uf=a/(d*(n-1)) where n=a/(d*u), or u=a/(d*n).

This is a good time to use h, the hyperfocal distance, which is the value that un takes when uf is infinite, and h=a/d. Now, at last, we've got a simple definition for the depth of field: If the lens is focused on a point at h/n (i.e., u=h/n), then everything in the distance range from h/(n+1) to h/(n-1) will be in sharp in the image. So, the formula for depth of field is: DoF = uf - un = h/(n - 1) - h/(n + 1).

Note that the definition depends only on a (the lens diameter, a physical dimension), d (the size of the disk of confusion, another physical dimension) and u (the object distance, a third physical dimension). It doesn't depend directly on f, the focal length of the lens, except for the fact that f affects h, the hyperfocal distance.

One conclusion is clear: If we are using a zoom lens, then it doesn't matter whether we get close up with a wide angle or a long way away with a telephoto. Provided we keep the image size fixed and the same F stop, the depth of field will be constant irrespective of focal length (since F=a/f). However, it does depend on image size.

Camera image size

We need to consider camera image size because it indirectly affects the lens aperture. For a given field angle, subtended at the lens by an object in the scene, the required focal length of the lens is directly proportional to the image size. So the larger the image, the larger the physical aperture a must be for a give F stop since F=a/f.

Now we all know that a 35mm slide or negative is 36mm × 24mm, because we can see it. But TV sensors are hidden to us, so we have to read the makers' data sheets for values and use a bit of history to interpret them.

Camera image sizes are always quoted in inches. The value is the diagonal, just as for a display CRT. However, just as in the CRT, where the value can mean either the diagonal of the visible area or the distance between the holes in the mounting lugs, the camera image size doesn't always mean what we expect.

In the early days, TV images were formed on such monstrous devices as image orthicon tubes, and the dimensions of 4.5in and 3in were common. Then we had 1in and 2/3in vidicon and plumbicon tubes, but by this time the dimension was not of the image but of the glass tube itself. The actual image on a 1in tube was 16.5mm diagonal (13.2mm × 9.9mm).

Broadcasters have been using the 2/3in format for many years, though some 1/2in cameras are being used for economy. The cameras used by amateurs are often 1/3in or 1/4in format, so using the same logic, the images are actually 5.5mm or 4.1mm diagonal. For the record, the part of a 35mm movie film frame used for a 4:3 aspect ratio is 20.12mm × 15.1mm. For Super16, it is 11.65mm × 7mm.

To see how this affects things, let's look at an example. A 35mm still camera with a 50mm lens gives a natural looking image because perspective isn't distorted when viewed at the right distance. Conventional wisdom says this is because the focal length is approximately equal to the image diagonal (43mm in this case).

But, being interested in the maths and not the suppositions, we can show (but I'm not going to here) that the viewing distance for correct perspective is the focal length of the camera lens multiplied by the total image magnification. Longer focal lengths flatten the image; shorter lenses exaggerate the foreground.

Back to the 35mm image: To get the same natural image with a 1/4in TV sensor (4.1mm diagonal), the focal length must be 50×4.1/43=4.8mm. Just to check, my Panasonic DX100 lens goes from 4mm to 4.8mm, and that seems about right. So the lens is a lot smaller for small image formats, as well as a lot lighter and a lot cheaper. That's why they do it.

This also means that, for a given F stop, the lens is a lot smaller, and the physical aperture a is directly proportional to image size. So the depth of field is inversely proportional to image size for the same F stop. And that's why we can't get short depth of field in amateur camcorders.

Also, I've given you the means to calculate the depth of field for the entire system, since the size of the disk of confusion is the largest of that of the lens, camera, transmission system, display and viewer's vision. You can't just look at one part of the process. It makes sense only when the whole production and display process is taken into consideration.

Conclusion

Depth of field depends on object distance and lens aperture, visual acuity and magnification. It doesn't depend directly on the focal length of the lens. It's pretty near impossible to get the same depth of field in a small TV camera as we get in 35mm still photography. So, if we want to isolate foreground and background using depth of field, we need another way. The usual technique is to distort the perspective by getting in really close with the camera at a wide angle. That's fine for inanimate objects such as flowers and insects, but it is not flattering for a human subject.

Many cameras for electronic cinematography use a single large sensor, rather than a splitter block and three small sensors. This has one advantage in that it can give the same depth of field as a 35mm movie camera.

Alan Roberts consults on HDTV, cameras and color science.