Knowledge of the Human Visual System is essential to understand how future equipment might be specified for improved performance.
On an evolutionary scale, film and television have a vanishingly short history. This means the senses of humans and other creatures evolved for some other reason than to watch screens. That reason is simple: survival. A creature with slightly more effective senses has a better chance of avoiding being eaten, finding food and finding a mate; therefore, we expect surviving species to have been selected for effective senses, which is indeed the case.
Reality vs. illusion
While I intend to concentrate here on the Human Visual System (HVS), it is worth noting in passing that all of the senses combine to create a model in our minds of what is around us. We call it reality, whereas it is only an illusion due to that part of our environment that our senses permit us to detect. If our senses were different, if we had the hearing of a dog and the vision of an eagle, our reality would be different. There is a spread of performance of senses between humans, and it changes with age and can be impaired by various substances. You might define reality as the remaining illusion due to a shortage of alcohol. There is no one reality, and many things must remain forever subjective.
For example, I have lost count of the number of times I have been informed that I must be uncomfortable on account of what I am wearing or the way I am sitting and so on. Some advice is then proffered as to how I could improve my comfort. People who act that way have yet to learn that their own reality may not reflect everyone else's and have a way to go in personal growth.
For a further example, a significant number of people have essentially no stereoscopic vision, and others are color blind to different extents. But because that is their reality, they don't know that they lack those abilities unless by chance they are tested. It follows that the average person viewing a TV screen will see more than some and less than others. Consequently, simply assessing quality by looking at the picture is likely to be unsatisfactory because someone else may see things that you missed. The regular use of objective quality measurements overcomes that problem.
The HVS has some superficial similarity to a camera. Figure 1 shows that it has a lens, an iris and a sensor, but that is where the similarity ends. To make progress in understanding the HVS and in how to use a TV camera, it is vital to understand that in most respects, the two are completely different. The reality allowed to a TV camera by its sensing ability is different to the human sensation; therefore, the well-worn saying “the camera never lies” is one of the greatest fallacies I have ever come across. The fact is the camera always lies, and part of the art of the videographer is to keep it honest, or at least keep it to the story you are trying to tell.
Human vision is extremely indirect, so what we call vision is not the image on the retina, but something that has been subject to what we might describe as extensive post-production. Essentially human vision comes from a kind of three-dimensional frame store, which contains a model of our surroundings. We are also in the model, so we can reach out and grasp things like focus rings and battery packs.
The reason for this arrangement is that it allows a huge field of view, around 180 degrees horizontally. I once attended a conference where another presenter had calculated the bandwidth needed to replicate the field of view of human vision with the same acuity, color gamut and the same temporal response. The result was, if I remember correctly, in the terabit per second region. Clearly, the HVS cannot and does not transmit terabits per second between the eyes and the brain, so it must be more subtle, and that is where the frame store comes in.
Self-evidently, the frame store only needs to be updated if anything changes. This alone allows a colossal reduction in bandwidth from the eye. The second bandwidth reduction measure is that the acuity of the eye is not uniform. Best acuity is only present in a small central circular area known as the fovea. Obviously, the eye then needs to swivel so that it can rapidly place the item of interest on the fovea. It will do so if anything is detected by any sense that suggests something has changed in the environment.
The cause might include a sound, a vibration, a change in radiant heat or touch sensed by the skin, or something detected by the peripheral vision. Because the primary purpose of peripheral vision is to alert the senses to a change in the environment, the response rate of peripheral vision is higher than foveal vision. Try looking away from a TV set so it appears right at the edge of your field of view, and you will see it flickering. You will also experience a strong urge to look at it, which stops the flickering. This is one of the mechanisms behind TV addiction.
The retina is covered in what are effectively pixels, although they are officially called rods and cones. The density of the pixels is low in the peripheral vision area and higher in the fovea. Only the fovea has color pixels. Peripheral vision sees in monochrome, and the sensation of color comes from the frame store. But color video will have to wait for another time. Interestingly, the density of pixels in the fovea is much less than we would find in a TV camera of similar acuity. In other words, the HVS appears to violate sampling theory. In fact, it doesn't because something else is going on, known as saccadic motion.
Figure 2. Saccadic motion is the involuntary constant minute oscillation of the eyeball.
Saccadic motion is the involuntary constant minute oscillation of the eyeball. It has the result of shifting a given pixel to a large number of different locations so that over time, a high-resolution image can be built up from all the measurements made in various places.
The HVS causes the eye motion, and then cancels the resultant image shift in a kind of DVE, with the result that the pixels shift. The system can be seen in Figure 2. The HVS then integrates a sharp image by adding a number of images over time with the pixels in multiple locations. Clearly this takes time, and it is one of the reasons for persistence of vision and the reduced visibility of flicker in the fovea.
In normal circumstances, our eyes never stop moving, and this is just as well because our vision is AC- coupled, and we can only perceive changes. One reason for this is that evolution resulted in the retina being back to front, with the light-sensitive layer underneath the blood vessels and nerves. The raw retinal image is like looking through a bowl of pasta. However, by keeping the eye moving, we can average out the shadows of the blood vessels and get the image. Only the good images can update the frame store, so we never see the pasta.
The delay caused by the image processing is quite significant, so what we see is behind real time. However, if you move your hands together, you feel them touch at the same time as you see them touch because the HVS has shifted the position of our hands in the image to where it should be because it knows they are moving and how fast. The same mechanism allows you to catch a ball.
Another interesting difference between our eyes and a camera is that if a camera is panned very rapidly, the image smears out unrecognizably. When our eyes move rapidly from one point of interest to another, we don't see image smear, nor do we see darkness. What happens is that the eyes are essentially switched off when they move rapidly but we continue to see what is in our frame store. Magicians know this and can do things without the audience seeing because they elicit a rapid eye shift during the action they want to conceal. You can try looking at one of your eyes in a mirror, and then switching your gaze to the other eye. You will never see your own eyes move.
It should not be necessary to point out that a typical camera has only one eye, whereas many creatures, apart from the mythical Cyclops, have two eyes. The HVS merges the information from both eyes, such that we believe we are looking at life from a viewpoint half way between our eyes. Stereoscopy warrants an article of its own, so I won't go into it further here.
John Watkinson is a high-technology consultant and author of several books on video technology.