Earlier in this column, we looked at various artifacts introduced in the compression process, such as mosquito noise, contouring and blockiness, including basis functions. This month, we'll examine some aspects of quantization in more detail, and look at other artifacts as well.
Block coding, transforms and quantization
Quantization is the process by which continuous analog functions can be represented by a fixed-precision digital system. Typically, video is quantized to 8 bits or 10 bits per each red, green and blue pixel, and audio is quantized to 16 bits per sample per channel. Using fewer bits results in a coarser quantization and increases the visibility (or audibility) of the resulting error, called quantization noise. Figure 1 shows an image where each pixel is quantized to 4 bits.
Quantization is carried out in various places in a typical video system, as shown in Figure 2. Most video compression systems use transform coding, where blocks of pixels are converted into the frequency representation of the pixels. The most well-known of these transforms is the discrete cosine transform (DCT). Numerically, the DCT is an arrangement of multiplications and additions of the values of all the pixels within a (typically, eight-by-eight) block. Carrying out the DCT results in a new array of values called transform coefficients, and by employing the inverse process at the decoder, the original block of pixels can be reconstructed.
The choice of where to perform the largest quantization will have a huge effect on the perception of artifacts. While the 4-bit linear quantization of pixels would cause noticeable contouring in an image, carrying out the same level of quantization on the transform coefficients would have a much smaller effect on the image. (See Figure 3.) This is because the quantization error of the DCT coefficients will spread the quantization noise over the entire block, avoiding the contouring problem. Of course, at a certain level of quantization, a large enough error in the DC coefficient — which represents the average intensity of the entire block — will cause the edges of the block to mismatch the surrounding blocks, causing “blocking” artifacts in the image.
However, good encoders will attempt to minimize these mismatches by taking neighboring blocks into consideration and avoiding large discontinuities. Blockiness also results from motion compensation, often causing output blocks to be reconstructed from other parts of the picture, with different detail; although the motion vectors are chosen so as to minimize the absolute error, local differences will always occur at the block boundaries.
In MPEG-4/AVC, a deblocking filter is employed in both the encoder and decoder to decrease blockiness. The filter can be disabled, if desired, because it comes at the expense of additional computational processing in both the encoder and decoder. (In theory, a simplified, noncompliant system could thus be deployed, if it were guaranteed that the transmission would never enable this filter.) The filter is within the encoder prediction loop, so, when used properly, it removes the hard edge of a quantized block while not affecting picture details. The threshold of the deblocking can be adjusted in the encoder, and this setting has different effects depending on video content; some studies suggest that the choice of filter parameter is less important with video containing high motion.
Motion artifacts, dirty windows and frozen faces
Another unsightly artifact is the so-called “dirty window” effect, where granules of noise appear to remain stationary, while real objects move beneath them, as if seen through a dirty window. In this case, the encoder may not be allocating enough bits to code the residual (prediction) error in the P (predictive) and B (bidirectionally predictive) pictures, so the error will persist in the decoded image until the next I (intra) reference frame is encountered. “Wavy noise” is a similar artifact that is often seen during slow pans across highly detailed objects, such as people in a crowd. Here, the coarsely quantized high-frequency DCT coefficients cause reconstruction errors to move spatially as details shift within the blocks.
An accurate rendition of moving images requires a sufficient bit allocation for both residual data and motion vectors (the elements used to predict the static and moving areas of an image, respectively). If motion vectors are well coded, but not the residuals, we would expect to see objects moving properly, but perhaps with more quantization noise, especially around edges. However, if not enough bits are allocated to motion vectors, and the residuals can't make up for the deficiency, then a “frozen face” artifact could occur, where parts of an object (such as a face) start to move, but the coded image fails to keep up with the motion, making certain parts of the object appear “stuck” at a previous position. Again, this should correct itself at the next I picture. In the limiting case, if the encoder sets the threshold very high for the “skip macroblock” instruction (i.e., no MV and no residual are transmitted), or there are simply not enough bits to code the macroblock, then the frozen face artifact can occur when the local object motion is small enough that it does not exceed the threshold of the bit allocation budget. We should note that all of these artifacts are exacerbated when the group of pictures — the number of pictures between I frames — is large, because the errors will persist for a long time.
It is often necessary to convert video from one compression system to another, to convert bit rate or resolution within the same compression system, or both. For instance, when broadcast transmissions are carried on cable and satellite systems, the service operator may be taking a pre-encoded feed from the broadcaster. Rather than decode and encode, a higher-quality conversion can be achieved using transcoding. When transcoding an MPEG-2 stream to an MPEG-4 stream, the processor can use the motion vectors already derived in the MPEG-2 encoding as a starting point for generating the new motion vectors. To achieve higher efficiency, the new output processor can additionally use the enhanced toolkit offered by that compression scheme. Care needs to be taken when either of the compression systems runs at a low bit rate, because artifacts in the upstream video may become worsened by the subsequent transcoding.
Repurposing content from fixed broadcast to mobile receivers will also often require transcoding and bit rate reduction, so similar considerations apply — but viewing on small screens can render many artifacts less visible. An alternative to parallel coding or transcoding content is to code content in a scalable fashion, so simply dropping portions of the coded stream can yield versions of different qualities or resolutions. While possibly streamlining the production process, this approach works best when the same codec type (e.g., MPEG-4) is used for both decoders. However, when transmission systems multiplex different codec types, simple hierarchical coding is not possible.
Of course, archiving multiple versions of compressed video has storage implications; an alternative is to store only one compressed version and to play out the necessary compressed version(s) with transcoding done in real time. This will require a lot of faith that the end product can be transmitted without any intervening production inspection. However, the sophistication and quality of transcoders may already be at that point — a needed factor given the explosion of various forms of content distribution.
Aldo Cugnini is a consultant in the digital television industry.
Send questions and comments to:firstname.lastname@example.org