Disney Research, UCI Create AI Video Compression Model

IRVINE, Calif.—Disney Research and University of California, Irvine, computer scientists have developed an artificial intelligence-enhanced video compression model that they say is capable of competing against established video compression technology.

Researchers working on the project showed an early phase of their AI video compression model at the Conference on Neural Information Processing Systems in December 2019, that yielded less distortion and smaller bits-per-pixel rates than classic coding-decoding algorithms, like H.265, on specialized video content. On downscaled, publicly available YouTube videos, it had comparable results.

The compression model created by Disney Research and UCI is designed to downscale the dimensions of the video using a “variational autoencoder,” a neural network that processes each video frame in a sequence of actions that results in a condensed array of numbers. The autoencoder then tries to undo this operation so that the array contains enough info to restore the video frame.

The next step is for the algorithm to use a deep generative model to guess the next compressed version of an image given what has gone before. It then conducts an operation to encode frame content by rounding the autoencoder’s real-valued array to integers, which are supposed to be easier to store.

The final step is the application of lossless compression to the array, allowing for its exact restoration. The algorithm is informed by the neural network about which video frame to expect next for greater efficiency, per Disney Research and UCI.

“Intuitively, the better a compression algorithm is at predicting the next frame of a video—given what happened in the previous frames—the less it has to memorize,” said Stephan Mandt, UCI assistant professor of computer science, who first worked on this project while employed by Disney Research. “If you see a person walking in a particular direction, you can predict how that video will continue in the future, which means you have less to remember and less to store.”

Mandt continued: “The real contribution here was to combine this neural network-based deep generative video prediction model with everything else that belongs to compression algorithms, such as rounding and model-based lossless compression.”

Work is still progressing toward a real, applicable version of the video compressor, with Mandt indicating they may need to compress the neural network itself, along with the video.

TOPICS