How Effective Is AI For Encoding Video?

ONTARIO—When you think about artificial intelligence (AI)–specifically software capable of performing self-directed tasks historically performed by human intelligence–it conjures up images of Skynet; the evil AI in the Terminator movie franchise.

That’s fiction. But in real-life, AI is finding its way into the workplace, and that includes the encoding, transcoding and decoding of compressed video. Using the power of AI in their programs, vendors such as Bitmovin, Cobalt Digital, MediaKind, Telestream and V-Nova are speeding up their encoding programs while reducing bandwidth demands; resulting in faster, more affordable products for their customers.

Stefan Lederer, CEO and co-founder of Bitmovin

“AI is starting to play an important role in encoding, where it has enormous potential to dramatically improve workflows,” said Stefan Lederer, CEO and co-founder of Bitmovin, a developer of cloud-based media streaming technology. “With the emergence of new codecs, new video file formats and delivery methods, the TV and media industry needs solutions that improve encoding in the automated, immediate and highly efficient way that AI offers.”

[Read: How We See: The Human Visual System]

That said, there is a debate underway among vendors as to the limits of AI in the encoding process. Certainly AI–also known as “machine learning”–can accelerate the encoding process. But can it do everything that a human observer can to detect and remedy artifacts in compressed video? No one really knows.

It’s important to remember that the the improvement in speed is in the process of removing humans from the review stage after an encode has happened, according to Paul Turner, of Turner Media Consulting. “Encode parameters can be set to a predefined set of values, but you still have to look at the result and assess if the encoded output is of sufficient quality,” he said. “If not, you have to iterate on the settings.”

(For the record, ML is a narrower version of AI, in which AI-enabled software is charged with making decisions about specific data within predefined parameters; rather than becoming self-aware and choosing to annihilate the human race. In this article, we will use the two terms interchangeably, as the people who were interviewed tended to.)

HOW AI CAN IMPROVE ENCODING

Today’s video codecs use algorithms to analyze video imagery, to determine which bits can be removed to reduce file size without degrading the subjective image quality perceived by the viewer.

Injecting AI into the encoding process takes this process a step further. AI allows the software to proactively assess the quality of the compressed video before transmission. This lets the encoding system detect and remedy any artifacts inadvertently generated by the codec. As the AI does this, it “learns” from its actions, and uses this knowledge to improve its performance over successive applications.

The result: “By using AI, encoding solutions can make smart decisions about the compression settings and visual parameters of each frame, speeding up processing and improving encoding efficiency,” said Lederer. “Trained AI models can even predict the optimal encoding settings, as well as preprocessing tools, for every given source asset.”

There are other ways that AI can be used in encoding, according to Guido Meardi, CEO and co-founder of V-Nova, a U.K.-based developer of codecs. One of the most common is to improve an existing codec’s predictive capabilities, for deciding which bits can be safely removed.

“The better you predict the image, the less residuals you have to eventually encode,” Meardi said; “therefore the less you have to send down the pipe without compromising quality.”

THE LIMITS OF AI

In each of these examples, AI is trying to improve the video production process by automating quality control. This is meant to reduce much-slower (and more costly) human intervention to perform the same tasks.

“What you’re fundamentally trying to do is emulate human assessment,” said Shawn Carnahan, CTO for Telestream in Grass Valley, Calif. “You’re trying to use machine learning to emulate how the viewer perceives the quality of the content, and use that to decide questions such as ‘can I drive the bitrate even further, or do I need to increase the bitrate to maintain subjective quality?’”

If this sounds like a daunting task, it is. AI software is literally trained “to look for things in the image that human viewers would find objectionable,” Carnahan said. “You are training a machine to spot things that shouldn’t be there.”

Carl Furgusson, vice president of Portfolio Management for MediaKind

This is where the limits of AI-enabled video encoding come to bear. “Mapping or trying to best represent the human visual system using software is near impossible,” said Carl Furgusson, vice president of Portfolio Management for MediaKind (formerly Ericsson Media Solutions). “People have been trying to do it for 20 or 30-plus years without success, and I don’t think anyone will ever really get to an accurate mapping of the human visual system.”

The problem is the subjective nature of human viewing, compared to a metrics-based AI viewing model. “You will always get different results between what people think in reality is better picture quality, and what the measurement tools will give you,” Furgusson said; no matter how sophisticated the AI viewing model may be.

Quality assessment isn’t only about absolute picture quality; there are artifacts that non-trained human viewers will not notice, according to Turner.

“That has to be factored in to the AI training too,” he said.

Does this mean that AI is doomed to play a minor role in video compression? V-Nova’s Guido Meardi doesn’t think so. Even with its limits, he predicts that AI will become “an integral part of compression engines in the future.”

Still, until this technology can actually match the complexities and nuances of the human visual system, human intervention will remain a necessary element of high-quality video compression. At best, AI will continually drive down the percentage of instances when humans have to step in to fix the picture quality.

James Careless is an award-winning journalist who has written for TV Technology since the 1990s. He has covered HDTV from the days of the six competing HDTV formats that led to the 1993 Grand Alliance, and onwards through ATSC 3.0 and OTT. He also writes for Radio World, along with other publications in aerospace, defense, public safety, streaming media, plus the amusement park industry for something different.