Video encoders

It is often assumed that when someone is speaking of encoders, they're referring to MPEG-2. But with the expansion of encoding platforms and video compression algorithms, it's important to note that while the MPEG-2 transport stream and its rich syntax is a part of most compressed transmission systems, the video and audio encoding choices offer variety and performance to match varying applications.

There are reasons for the expansion of options. Manufacturers need to introduce new and improved technology to maintain their sales and profits. If every refrigerator was still in use today, we would have stopped making them decades ago. Obsolescence is a fact of life, however inconvenient. A client recently showed exasperation with an encoder manufacturer who didn't offer a clear upgrade path without fork lifting the old boxes and replacing them with new ones. In the end, the buyer benefits from better technology, even if the budget gets strained to make it happen.

Improvements

The improvements in encoders are dramatic. Early MPEG-2 struggled to get decent SD pictures through a 15Mb/s pipe. Now distribution to the home drops to about 10 percent of that early level, while HD content can be delivered at below 15Mb/s, dependent on content, of course. At the same time, MPEG-4 (AVC, Part 10, H.264) delivers great HD at below 10Mb/s and holds the promise of rates approaching 5Mb/s in the future.

Coupled with changes in satellite modulation, including 8PSK and DVB-S2 modulation standards, lower bit rates and less occupied bandwidth in the satellite equate to lower cost for backhaul and transmission to the home. I often hear that we standardized the encoding and emission standards for ATSC DTV too early, but that begs a perfect vision of the future. This much, however, is clear: Encoding will continue to improve, and new algorithms will emerge that offer further improvements in both quality and bandwidth performance.

There is a limit to the reach of the physics and mathematics involved. I won't attempt to predict a quantifiable (no pun intended) number. I remember a paper published a while ago in the EBU Review that purported to mathematically prove video couldn't be compressed in less than 34Mb/s. (In Europe, 34Mb/s is the magic number, as it is the E3 data transmission standard, the equivalent in concept of the North American DS3 standards.) That paper came out as MPEG-2 was in its infancy, but missed the point that perceptual quality has a correlation to bit rate. It is not a hard linear link, but rather a bit of fuzzy logic due to the nature of the complex physiology of human visual systems and the way we see things in motion.

The complicated nature of the problem cannot solely be modeled mathematically, so it becomes critical that any design include iterative testing by both expert and nonexpert viewers who can verify the results of any new technique. One interesting approach is two-pass encoding. The encoder does a first pass to look at the statistics of the images and the results it achieves with default settings. Then it finds places where the content can be improved if bits were available and steals them from content that is less challenging, using the new parameters to re-encode the final output with optimized results. Multipass encoding is the norm in high-end DVD authoring, where the quality must be maximized on a space-limited medium.

With H.264, the options available to an encoder (the toolkit) are considerably expanded from MPEG-2. Block sizes are variable, multiple motion vectors can be associated with a block, and multiple reference frames can be used for predictions. CABAC and CAVLC coding are used in addition to variable length coding, giving designers more options on an adaptive basis. H.264 supports both 4:2:0 and 4:2:2 as they are in MPEG-2, but also supports higher bit depth — up to 14 bits per sample for high-quality applications like master recording. More profiles have been standardized as well.

Applications

It's tempting to think of all modern encoders as viable in all environments, but in reality, some implementations are more appropriate for certain business models. For example, latency, or encode/decode delay, has an effect on some applications. In live two-way interviews, it is critical to minimize the latency between distant reporters and home studios interconnected by satellite. The roundtrip to a geosynchronous satellite is 240ms-279ms, depending on the geometry of the access. That is difficult enough by itself, though audiences are used to it. But unlike analog equipment, which adds little additional delay, encoders add from 50ms to several seconds! Why so large a range? Low latency uses the lowest performance profile (SP@ML, i.e., Simple Profile @ Main Level), and HD content might be encoded with two-pass encoders to achieve the highest quality. Those are different processes. Two-pass encoding must have at least a couple of GOP groups in memory to compare encoding statistics, leading to longer latency, and making a process unsuitable to live interviews.

Encoders intended for a high degree of flexibility are often chosen for SNG operations. They can offer SD and HD encoding, and normally include the modulation for transmission as well. And encoders with satellite modulation modules would not be appropriate for an ATSC installation because a portion would never be used.

For an IPTV installation, ASI outputs might not be appropriate, so an encoder with IP over Ethernet might be a better choice. In such a case, an encoder that can simultaneously do MPEG-4 and MPEG-2 might have an application, or perhaps one that can encode an SD stream and an HD stream at the same time.

In systems where more than one encoder is needed, as in a terrestrial multicast, picking hardware that can perform statistical multiplexing is a worthwhile advantage. This allows encoders in the multiplex to free up bits not needed to adequately encode less challenging content and encoders with more complex content to use the additional capacity. The quality improvement can be dramatic.

Conclusion

At least one encoder manufacturer has tackled the quality lost when encoders are cascaded, as they most certainly are today. Their approach is to identify artifacts that signal things like block edges and then attempt to match the current encode process to the previous methods to minimize the potential quality lost. Nearly all content today has been through at least one concatenation process, so techniques like this can be valuable in preserving the quality of the final product.

John Luff is a broadcast technology consultant.

Send questions and comments to: john.luff@penton.com