Where is AV1 Heading?

The rise of online video has impacted media distribution at unprecedented levels. Providing great video quality is no longer a competitive advantage, but rather a “must have”—and the crux of this lies in how video is delivered. As the amount of data we need to get content delivered increases, new approaches to video compression are an absolute necessity.

Enter AV1, the open-source, next-generation, royalty-free alternative technology for delivering video over the internet.

[Read: Royalty-Free AV1 Codec Turned Loose]

Launched in late March 2018 by the Alliance for Open Media (AOMedia), this new codec marked a significant shift in the balance of power in online entertainment by challenging traditional business models and enabling innovation in social video, mobile video, VR and online TV—all without having to pay historical IP owners. Backed by the biggest tech leaders—from Apple to Google, Microsoft, Facebook, Intel, Cisco and more recently Vimeo—AV1 is in solid standing to become a widespread standard in the future.

But what makes this new codec stand out?

These are five key encoding and decoding techniques that make AV1 an interesting choice to use in video streaming.

FILM GRAIN SYNTHESIS

Film grain occurs commonly in photographic film—most noticeably in over-enlarged pictures—but can also be applied digitally for artistic effect. During digital video compression, film grain creates massive problems as it is hard to recognize and creates a lot of traffic in the bitstream—leading to high bitrate requirements for transmitting very little information. Since the information is of little value for the perceived quality, finding a way to not transfer the information with the bitstream—but rather re-apply it later—is a desirable solution.

This idea forms the base for AV1’s film grain synthesis. The goal is to de-noise the initial content before encoding it and then re-add the noise or grain effect before output during the decoding process. This way, the unnecessary information does not have to be transmitted at all and the overall load of data can be reduced substantially. The potential in bandwidth savings for content providers using this technology is enormous.

CONSTRAINED DIRECTIONAL ENHANCEMENT FILTER

Filtering is an essential process in every video codec, drastically increasing the perceived quality of an encoded video as well as the overall coding performance. AV1 contains various sets of filters, most of which are derived from existing codecs.

The Constrained Directional Enhancement Filter is among the most impactful of additions to the range of filters. CDEF merges two existing filters: a directional de-ringing filter and the constrained low pass filter (CLPF). CLPF is applied to filter out artifacts that stem from quantization errors and have not been corrected through the preceding application of a de-blocking filter. The directional de-ringing filter works by recognizing edges within each block and identifying their orientation. It then conditionally applies a directional low-pass filter along those edges, resulting in a smoother picture.

CDEF merges the two filters and works by analyzing the contents of each block – smoothing out artifacts along edges and de-blocking the picture. The search for the filtering parameters—direction and variance—is applied on the decoder’s end, after the actual video has already been encoded. The filtering process is also performed by the encoder, in order to get the correct reference frames. Since the filtering operation can be run on the consumer hardware, required network bandwidth can be reduced—and with it, the traffic load.

WARPED MOTION AND GLOBAL MOTION COMPENSATION

Predicting and compensating motions is an important principle in video compression. It reduces redundant information that would otherwise be part of a bitstream, hence increasing the amount of data being transmitted. Warped motion compensation is a particularly interesting technique that predicts spatial movement trajectories within videos. Based on calculated predictions, redundant information is identified and omitted in the coding process, resulting in a significant reduction in the required data load.

Global motion compensation predicts motions for an entire frame and uses this analysis to limit the amount of information transmitted in the bitstream. Motion-compensation algorithms have been used and theorized on for some time, but only on a two-dimensional level. AV1 marks the first time that non-planar motion compensation has been implemented into a video codec. Particularly suited to predict large area movements, like background motion or camera movements, this technique can also handle consistent backgrounds and color schemes very effectively, which is one of the reasons why animated videos tend to deliver great encoding results—even with very high levels of compression.

INCREASED CODING UNIT SIZE UP TO 128 x 128

As video resolutions keep getting larger, an increase in block size is an effective way to scale the compression process along with high resolution contents. Each frame is partitioned up into individual coding units or blocks, which are then processed individually during the coding procedure. Consequently, smaller resolutions like 1280×720 (720p) can be divided into blocks with an individual size of 64×64 quite easily, whereas the same block size is less practical for large resolutions, such as 7680×4320 (8K UHD).

NON-BINARY ARITHMETIC CODING

This technique is an interesting change from other codecs like HEVC or AVC. For those, every symbol which is entered into the arithmetic coding engine has to be binary. With AV1, these symbols can also be non-binary—meaning that they can have up to eight possible values instead of just two. The symbols are then processed by the arithmetic coding engine, which produces a binary bitstream as output. Both ends—encoder and decoder—operate using probability calculations to estimate how many output bits will be created from a given symbol.

Theoretically, any given input symbol could therefore produce multiple bits or even just a fraction of a bit. Although non-binary coding renders the coding process more complex by combining multiple values into a single symbol, it is still less complex than if it were one bit per symbol. One major benefit lies in the possibility to process more symbols per clock cycle using this procedure. As clock cycles have to be performed serially, non-binary coding achieves improvements by allowing multiple symbols to be handled during each serial cycle.

Over the last decade, video delivery has evolved rapidly. Up until now, a lot of the focus has been on video quality and improving the user experience, but the attention is now on video delivery—especially AV1. This codec is going to have massive impact on the world of video streaming in the near future. User demand for high-quality video streaming is already high, and the coming generation of high resolution mobile devices and VR-enabled devices is about to push its way into mainstream availability. Incorporating AV1, and the newer enhancements that come with it, will enable content providers to dynamically serve AV1-encoded content to users with compatible and complex systems.

Christopher Mueller is the CTO and co-founder of Bitmovin.

Recommended reading