Video encoders

Compression levels soar without introducing artifacts.
Publish date:
Social count:
Figure 1. DCT transforms

There are few places in modern video technology where technical complexity is higher than in video encoders. First, the purpose is to wring out of a running video as much of the data as possible without destroying the watchability of the content. Second, this needs to be done in a way that produces an output stream that conforms to tight specifications. The reason is related to a fundamental decision made at the time the standards are written. MPEG-2 encoders are intended to put the complexity and cost burden on the encoder, making the decoders cheap to build and deploy.

I would like to point out that there are two types of encoders: those that preserve quality by eliminating losses of content as much as possible and those that maximize the compression at the expense of quality, though in a controlled manner. The psychophysics of the process is reasonably well-known. For example, detail is less visible in motion and in deeply saturated colors. So an encoder can throw away content in areas of high motion, where redundancy between frames is great, to maximize the compression ratio.

Many tools abound for doing this in the MPEG toolkit, including run length encoding, entropy coding, motion vectors and block matching, predictive frames, and the basic tool used to convert spatial data into the frequency domain, using discrete cosine transform (DCT). (See Figure 1.) By creating blocks that code spatial frequency in the vertical and horizontal direction and then applying quantization to the resulting matrices, it is possible to quantize the high frequencies to a low level, leaving the visible low-frequency data to be coded with the highest precision possible. This preserves the quality that people actually perceive as most important. You might say the trick is to figure out what will not be missed, and then to design the math to do the transform with a minimum of calculation.

Video encoding is complex

In the end, despite years of development, the engineering and science of video encoding remains complex, perhaps increasingly so. It is arguable that in highly refined products, the complexity is highest.

For example, when an encoder processes video, it first looks at the content and guesses how it should encode specific areas of the picture to achieve the best result. By looking at the results and the residual errors in the content, it is possible to do a second pass at encoding using data gathered by the first pass to make the second one more effective at reducing the data rate while maximizing picture quality. In doing so, a varible bit rate encoder can hold available bits from easy pictures and apply the resources to complex, hard-to-encode pictures. This two-pass encoding requires memory and processing horsepower, which costs real dollars in any platform, software or hardware.

It is also often assumed that with modern computers, software can be as fast as special-purpose hardware at completing complex tasks given the speed of modern processors and techniques. As a general rule, that is possible, but in the case of video encoding, the sophistication of the processing and the amount of data that must be processed in real time makes most general-purpose computing platforms too slow for these tasks. This is especially true for real-time encoding of low bit rate HD content, particularly H.264.

Reducing a 1080i30 HD signal (1.24Gb/s of picture content) to 19.39Mb/s requires that 98.44 percent of the content has to be thrown away in real time, without making the picture so bad that no one will watch it. To drop it to 12Mb/s, as is done in many cases today, means a staggering 99.04 percent of the picture has to be dropped on the cutting room floor. That amounts to a slim 0.19 bits for every pixel (Y, R-Y, B-Y taken as one pixel). Or perhaps it makes sense to say five pixels are represented by one bit.

Advanced coding is in demand

This level of complexity, along with the simple fact that bits represent bandwidth, which equates to dollars, means that encoders capable of high-level compression without visible artifacts are highly desirable and valuable. The inevitable desire to cut bandwidth leads to research into advanced coding. This is why H.264 has started replacing MPEG-2 for many applications. The tools available in the more recent H.264 specification allow higher levels of compression without introducing obnoxious artifacts.

There is a downside, of course. An H.264 encoder that uses the full toolkit is much more complex. And though it can produce pictures with either improved picture quality or reduced bit rates, it costs more. Over time, the cost of encoders is coming down as the hardware that implements them becomes more ubiquitous. Like all things, volume begats price reductions. However, there is a limit. Volume production in the consumer marketplace is numbered in the millions, tens of millions or hundreds of millions. In our small industry, that number is more likely in the thousands or tens of thousands. This means R&D costs have to be spread across a smaller universe of products, with perhaps a longer lifetime.

There are special cases where this volume production problem has a major effect on compression product development. The H.264 standard includes 4:2:2 and 4:2:0 coding modes. The latter provides higher compression ratios and thus is applicable to the consumer. Coding at 4:2:2 is ideal for backhaul of content intended for later processing, but unfortunately that capability is only needed for low-volume products intended for a niche market. Over time, the ability to code 4:2:2 streams will be included in newer silicon integrated circuit solutions, making high-quality MPEG-4 applications cheaper to implement. In the meantime, other applications might be stuck with MPEG-2, where 4:2:2 is required and 4:2:0 is just out of the question since MPEG-2 4:2:2 is readily available today.

Compression technology, which has spawned an amazing array of applications not possible before, has continued to march. JPEG, originally developed for still images and adapted as motion JPEG to video, begat MPEG-2 with more complexity and a richer feature set nearly 20 years ago. Now a newer JPEG effort, JPEG2000, has been popularized for digital cinema and other applications where the highest quality, approaching mathematically lossless performance, is appropriate. Other coding schema will be developed for other purposes. We already see consumer decode chipsets that do MPEG-2 and MPEG-4 (H.264) for both broadcast and DVD applications. It seems reasonable to assume that future implementations will embrace an ever wider array of compression tools.

John Luff is a broadcast technology consultant.

Send questions and comments