Understanding the Latest Video Codec Standards

(Image credit: Agora)

Adopting a new codec standard is a complicated task. The timeline for getting up to speed is so long that once you’re there, the industry very well might be looking at the next generation of codecs already.

At last month’s RTE2021 conference, session moderator and Visionular VP of Video Marketing, Mark Donnigan, asked a panel to explain how it’s possible to move to an advanced codec standard without planning five years into the future.

Along with my fellow panelists, Debargha Mukherjee, Principal Engineer at Google; Tarek Amara, Engineering Manager at Twitch-Amazon, and Jan De Cock, Director, Codec Development at Synamedia, we helped answer Donnigan’s big question and explained what the industry (including AI) can do better to make the process smoother.

We quickly learned that codec fragmentation is accelerating, in part due to the development time necessary in the hardware-driven broadcast space. As it stands, we agreed VVC and AV1 are ready for prime time. Mukherjee pointed out the wider adoption within the last year of AV1 in hardware, particularly desktop and smart TV.

Getting ready for codec adoption
Amara takes a business mindset when considering how to compare codecs. He emphasized encoder implementation; how you can deliver in real-time, at low cost, to hundreds of thousands of channels at the same time. Even if you save 50% of the bits and a lot of costs, he said, it’s meaningless if you don’t have reach.

The ecosystem needs to be ready for the codec you adopt. You also need to consider network congestion and need to detect network bandwidth status. And if the codec is simple enough to run on mobile, you “build in scalability into the encoding implementation.”

“Adopting a codec is as much about the application as it is about the performance or benefits of the codec,” Donnigan observed. Mukherjee agreed, saying a more efficient format still needs to be deployed everywhere, and fragmentation complicates that.

It’s also important to consider the differences between, say, casual everyday-use applications and a real-time conference application. The search space is too large in a real-time application for AV1, VVC or even HEVC to be effective, Mukherjee said. The benefits of a new codec depend on “the smartness you can put in on the encoder side.”

Optimizing the encoder
Amara said he’d been seeing more and more hardware providers in the standardization groups because there are often challenges in decoder implementation after an encoder release. And that’s additionally complicated because encoder and decoder engineers often are on different teams. Essentially, it’s easier to build in scalability if you make decisions based on signals from the encoder and decoder side.

Mukherjee also noted that when you get close to real time, how you optimize the encoder matters more than bitstream syntax. I’ve found H.246 is most feasible at a very high speed, and “the median is way too complex.” But, I also see applications for AV1 when you reach four or five times the complexity of H.246 at high speed, and getting to two or three times the complexity allows for a lot of RTE/RTC applications. To Amara, there’s still a cost consideration as well.

The role of AI
When it comes to the role of AI and machine learning with a codec, De Cock advised that applying neural networks for real-time encoding is complicated, and the costs could spiral. He advised looking for algorithms that limit multiplications and quantizing the depth of multiplications. He added encoders could also be “more flexible and content-aware” and better able to predict bitrate for machine learning to be useful.

It can take two to three years to research and develop a codec, and hardware encoders tend to evolve three to four years after a codec’s bitstream is finalized. The industry needs to reduce the time to adoption for hardware and software applications. As new applications emerge, the codec standards will need to be there to serve users at scale.