Video encoders

In the July column, “A switcher's life cycle,” I spoke about product life cycle and product maturity. Nowhere is this a more appropriate concept than when speaking about video encoders. To be clear, it is often assumed that one is referring to MPEG-2 encoders. But with the expansion of encoding platforms and video compression algorithms, it is important to note that while the MPEG-2 transport stream and its rich syntax is a part of most compressed transmission systems, the video and audio encoding choices offer variety and performance to match widely varying applications.

There are many reasons for the expansion of options. First, referring to the product life cycle point, manufacturers have a need to introduce new and improved technology to maintain their sales and profits. If every refrigerator was still in use today, we would have stopped making them decades ago. Obsolescence is a fact of life, however inconvenient. A client recently showed exasperation with an encoder manufacturer who didn't offer a clear upgrade path without fork lifting the old boxes and replacing them with new ones. In the end, the buyer benefits from better technology, even if the budget gets strained to make it happen.

Improvements

The improvements in encoders are dramatic. Early MPEG-2 struggled to get decent SD pictures through a 15Mb/s pipe. Now distribution to the home drops to about 10 percent of that early level, while HD content can be delivered at below 15Mb/s, dependent on content, of course. At the same time, MPEG-4 (AVC, Part 10, H.264) delivers great HD at below 10Mb/s and holds the promise of rates approaching 5Mb/s in the future.

Coupled with changes in satellite modulation, including 8PSK and DVB-S2 modulation standards, lower bit rates and less occupied bandwidth in the satellite equate to lower cost for backhaul and transmission to the home. This much is clear: Encoding will continue to improve, and new algorithms will emerge that offer further improvements in both quality and bandwidth performance.

There is a limit to the reach of the physics and mathematics involved. I won't attempt to predict a quantifiable (no pun intended) number. I remember a paper published a while ago in the EBU Technical Review that purported to mathematically prove video couldn't be compressed in less than 34Mb/s. That paper came out as MPEG-2 was in its infancy, but missed the point that perceptual quality has a correlation to bit rate. However, it is not a hard linear link, but rather a bit of fuzzy logic due to the nature of the complex physiology of human visual systems and especially the way we see things in motion.

The complicated nature of the problem cannot solely be modeled mathematically, so it becomes critical that any design include iterative testing by both expert and nonexpert viewers who can verify the results of any new technique. One interesting approach is two-pass encoding. The encoder does a first pass to look at the statistics of the images and the results it achieves with default settings. Then it finds places where the content can be improved if bits were available and steals them from content that is less challenging, using the new parameters to re-encode the final output with optimized results. Multipass encoding is the norm in high-end DVD authoring, where the quality must be maximized on a space-limited medium.

With H.264, the options available to an encoder (the toolkit) are considerably expanded from MPEG-2. Block sizes are variable, multiple motion vectors can be associated with a block, and multiple reference frames can be used for predictions. CABAC and CAVLC coding are used in addition to variable length coding, giving designers more options on an adaptive basis. H.264 supports both 4:2:0 and 4:2:2 as they are in MPEG-2, but also supports higher bit depth — up to 14 bits per sample for high-quality applications like master recording. More profiles have been standardized as well. It is a complex standard that makes great bedtime reading.

Applications

It is tempting to think of all modern encoders as viable in all environments, but in reality, there are things that make some implementations more appropriate to different business models. For example, latency, or encode/decode delay, has a huge effect on some applications. In live two-way interviews, it is critical to minimize the latency between distant reporters and home studios interconnected by satellite.

To begin with, the roundtrip to a geosynchronous satellite is 240ms-279ms, depending on the geometry of the access. That is difficult enough by itself, though audiences are used to it. But unlike analog equipment, which adds little additional delay, encoders add from 50ms to several seconds! Why so large a range? Low latency uses the lowest performance profile (SP@ML, i.e., Simple Profile @ Main Level), and HD content might be encoded with two-pass encoders to achieve the highest quality. Those are different processes. Two-pass encoding must have at least a couple of GOP groups in memory to compare encoding statistics, leading to much longer latency, and making a process unsuitable to live interviews.

Encoders intended for a high degree of flexibility are often chosen for SNG operations. They can offer both SD and HD encoding, and normally include the modulation for transmission as well. For an IPTV installation, ASI outputs might not be appropriate, so an encoder with IP over Ethernet might be a better choice. In such a case, an encoder that can simultaneously do MPEG-4 and MPEG-2 might have an application, or perhaps one that can encode an SD stream and an HD stream at the same time.

In systems where more than one encoder is needed, as in a terrestrial multicast, picking hardware that can perform statistical multiplexing is a worthwhile advantage. This allows encoders in the multiplex to free up bits not needed to adequately encode less challenging content and encoders with more complex content to use the additional capacity. The quality improvement can be dramatic.

Conclusion

Lastly, at least one encoder manufacturer has tackled the quality lost when encoders are cascaded, as they most certainly are today. Their approach is to identify artifacts that signal things like block edges and then attempt to match the current encode process to the previous methods to minimize the potential quality lost. Nearly all content today has been through at least one concatenation process, so techniques like this can be valuable in preserving the quality of the final product.

John Luff is a broadcast technology consultant.

Send questions and comments to: john.luff@penton.com