What is 4:2:2?

4:4:4, 4:2:2, 4:2:0 -- what do these numbers mean, and where did they come from? In a recent column about ITU-R Recommendation BT.601 (also known as Rec. 601), the component SD video interface standard, I mentioned some of these numbers. They refer to the sample rates applied to the various video components.

First, let's consider 4:2:2, the most commonly encountered color difference component sampling scheme, and the one used in the first component digital recording format, D-1. The primary sample rate in Rec. 601 is 13.5 MHz, a number compatible with 525/59.94 and 625/50 systems. In 4:2:2 sampling, the Y or luminance component is sampled at the full 13.5 MHz rate, designated as a "4" in 4:2:2.

R-Y and B-Y, the two chrominance components, are each sampled at 6.75 MHz, half the 13.5 MHz rate, and each of these is designated as a "2". Why is this not called 2:1:1, you may ask.

The answer to this query is wrapped in the haze of ill-remembered history. One rationale is based on the fact that before the 13.5 MHz sample clock was internationally standardized, one component sample clock that had been considered was four times the NTSC subcarrier frequency, also known as 4fsc, which is a little more than 14 MHz. Another perspective relates to subsampling: Of a group of four samples of each component, all four are captured for the luminance signal, while only two of the four are captured for each color difference component.

4:4:4 FOR HI-DEF

One of the Rec. 601 family of component sets is 4:4:4, which may be applied to either RGB or color difference signals. 4:4:4 RGB refers to red, green and blue signals of which each is sampled at 13.5 MHz, while 4:4:4 color difference is a set wherein the color difference components, as well as the luminance component, are sampled at the full 13.5 MHz. We may add a key channel to these, giving us 4:4:4:4 (or 4:2:2:4).

The 18 MHz family members of 601 are designated in the same way, but of course in that case "4" is 18 MHz, not 13.5 MHz. Whatever its historical origin, the "4-based" nomenclature has been extended to the world of high-definition video. In that case the fundamental sample rate, or "4", is 74.25 MHz and a "2" is half that rate or 37.125 MHz.

An important aside: Rec. 601 is written in consideration of the 59.94 Hz frame rate; that is, the 13.5 MHz sampling clock (and the 18 MHz sampling clock, by extension) is exact for the 59.97 Hz rate in the 525-line family. For 60.00 Hz operation, the sampling clock must be run at 13.5 MHz multiplied by 1.001. In HDTV, the 74.25 MHz sample clock is exact for 60.00 Hz operation, and for operation at 59.94 Hz, the rate must be divided by 1.001.

(click thumbnail)Figure 1

(click thumbnail)Figure 2
We have looked at what these numbers mean in terms of sample rates. The siting of samples, that is, where the samples are taken spatially, must also be considered. In the case of Rec. 601, co-sited samples are taken. In the RGB case, R, G and B samples are taken at the same location in the line, and similarly for 4:4:4 color-difference sampling. In the 4:2:2 color-difference case, at the first sample point on a line, Y (luminance), CR(R-Y), and CB (B-Y) samples are all taken; at the second sample point only a Y sample is taken; at the third sample point a Y, a CB and a CR are taken, and this process is repeated throughout the line. Fig. 1 represents the 4:2:2 sampling process graphically.

While in 4:2:2 sampling the color difference signals are half the bandwidth of the luminance signal (but still much wider in bandwidth than for NTSC or PAL); most video compression systems require further bandwidth limitations. If possible, the additional bandwidth reduction is all taken in the color-difference signals, preserving the detail in the luminance signal. To this end, there are a couple of approaches that are frequently encountered. 4:1:1 sampling, which is found in some of the DV recording formats, is graphically illustrated in Fig. 2. CR and CB samples are co-sited with every fourth, rather than every second, luminance sample.

This results in a reduction in color resolution to half that afforded by 4:2:2, but it is still higher color resolution than that of which either NTSC or PAL is capable. MPEG-2 and some other compression schemes use 4:2:0 subsampling.

(click thumbnail)Figure 3

(click thumbnail)Figure 4
The net result of 4:2:0 is the same number of luminance and color difference samples as in 4:1:1, but the siting of the samples is different. In 4:2:0, a single color-difference sample is taken for each two luminance samples, such that a given line contains luminance samples and CR samples only. The line below it contains luminance samples and CB samples only.

VARIETIES OF 4:2:0

There are two varieties of 4:2:0, the difference being where the color-difference samples are sited relative to the luminance samples. MPEG-2 and some 625/50 DV formats use co-sited 4:2:0 sampling, while MPEG-1 and JPEG use interstitially sited 4:2:0 sampling. In co-sited 4:2:0, graphically illustrated in Fig. 3, at the first sample site in the first line, a Y sample and a CB sample are taken. At the second site a Y sample only is taken, while at the third site a Y and a CB are taken and this is repeated across the line.

At the first site in the second line a Y sample and a CR sample are taken, at the second site a Y sample only, at the third a Y and a CR, and this is repeated across the line. In interstitially sited 4:2:0, graphically illustrated in Fig. 4, the color-difference samples are taken between the luminance samples, rather than being co-sited. The first line of samples goes Y, CB, Y, CB, etc., while the second line goes Y, CR, Y, CR, etc.

These are some of the common sampling and subsampling techniques used in HD and SD video.