How Multi-layer Coding Standard MPEG-5 LCEVC Can Enable High-quality Metaverse/XR Applications

(Image credit: Getty)

We’ve all heard the hype and excitement around the "Metaverse" and Extended Reality (XR)—but contrary to previous “Next Big Things,” they are actually here to stay in one form or another, bringing major changes to how we work, shop, learn, interact and play online.

First, we need to understand what we mean by Metaverse (or meta-universe). John Riccitiello, CEO of Unity Software, recently gave a good definition: “It is the next generation of the internet, always real time, mostly 3D, mostly interactive, mostly social and mostly persistent.”

In practice, it is a new type of internet user interface to interact with other parties and access data as a seamless and intuitive augmentation with the actual world in front of us, inspired by the online 3D worlds already familiar to users of multiplayer video games. Think of Tom Cruise in Minority Report using hand gestures to control and manipulate data beamed in front of his face; as of today, he would be wearing an XR headset or glasses, or looking into an auto-stereoscopic head-tracking screen.

While gamers have been enjoying 3D worlds with 6 Degrees of Freedom (6DoF) for over 20 years, there have been—and remain—some technical barriers to mass adoption of immersive XR for non-gaming applications. Most notably, it has taken headset devices a long time to become fit for purpose in terms of acceptable size and weight, while methods of intuitive control need to be able to vary with different usage—while gesture control is cool, for some use cases, such as e-commerce or productivity tools, a keyboard, tablet or haptic controller are actually preferable.

These XR elements continue to be refined, and many solutions are already on the market. However, it is becoming clear that one of the last barriers to overcome is the huge volume of data and processing required to enable immersive experiences at high enough quality.

Building Better Quality Experiences
Quality of experience is non-negotiable: end-users expect visually stunning, realistic, smooth and immersive experiences. This is of little surprise when video gamers now take for granted the exquisite detail of near-film-like images, even in game play.

While traditional 2D user interfaces may tolerate some imperfections, the point of the Metaverse is precisely the illusion of "presence," which necessitates impeccable audio-visual quality, high-quality 3D objects, realistic lighting, high resolutions, high frame rates and low-latency real time. If users have a bad experience with sketchy graphics and lagging pictures, they are unlikely to come back for more. Unsurprisingly, there is a high correlation between realism of the experience and frequency of use, pushing companies to build better quality experiences to bring users back time and again.

Lightweight XR devices do not and will not have enough processing power to render sufficiently realistic experiences. The disparities in users’ devices also need to be taken into consideration if the Metaverse is to be available to a mass audience; the display quality and the processing power of a gaming PC vs. a standard laptop are significantly different, yet the quality of experience must be comparable despite the huge volumes of data required.

"Split computing" helps to solve this issue, performing 3D rendering on a separate device, possibly in the cloud, while display is handled on the XR device. However, the resulting rendered graphics frames must be streamed to the device as high-resolution high-framerate ultra-low-latency stereoscopic video—and there are many video coding constraints associated with XR casting, especially when wireless connectivity is involved. Luckily, there is a standard solution that makes it possible within realistic constraints.

The Benefits of Multi-layer Coding
Data compression and manipulation that fits the quality, bandwidth, processing and latency limitations of solid latest-generation network connectivity is imperative, with three main technical challenges to overcome:

Suitable compression and streaming of volumetric objects to the rendering device, as well as across rendering devices, requiring new coding approaches;
Ultra-low-latency video encoding at 4K 72 fps and beyond, but also at a bandwidth lower than 30 Mbps to cope with realistic Wi-Fi / 5G sustained rates, and
Strong network backbone/CDNs/wireless in place between the rendering and XR display device, and sufficient cloud rendering resources available.

Layered encoding, i.e., structuring data in hierarchical layers that can be accessed and/or decoded as required makes sense. After some false starts, the latest attempts at making layered coding work, in the form of the MPEG-5 LCEVC (Low Complexity Enhancement Video Coding) and SMPTE VC-6 standards, enable many of the benefits facilitated by layered coding.

LCEVC is ISO-MPEG’s new hybrid multi-layer coding enhancement standard. It is codec-agnostic as it combines a lower-resolution base layer encoded with any traditional codec (e.g., h.264, HEVC, VP9, AV1 or VVC) with a layer of residual data that reconstructs the full resolution. The LCEVC coding tools are particularly suitable to efficiently compress details, from both a processing and a compression standpoint, while leveraging a traditional codec at a lower resolution effectively uses the hardware acceleration available for that codec and makes it more efficient.

Enhancing streams with LCEVC enables the delivery of UHD streams at bandwidth typically used for HD video, offering higher quality video with up to 40% bitrate savings and up to 3x faster transcoding. It also enables HDR 10-bit over 8-bit codecs such as h.264 or VP9, and offers a unique reduction of latency jitter thanks to the multi-layer structure inherent in LCEVC. All these benefits can be achieved while keeping average system latency to a minimum. LCEVC’s low-complexity design also enables higher resolutions at sustainable battery consumption, aiding sustainability considerations.

Many of the tech giants including Meta, Google, Microsoft, Apple, Sony and NVIDIA are investing tens of billions of dollars in Metaverse technology, and industry predictions see XR functionality of some kind on most internet destinations and apps within the next 5-10 years. The ability to tackle the data volume challenges will be one of the key factors driving the speed with which the XR meta-universe is brought to fruition; the development and adoption of multi-layer coding standards will materially accelerate the development of high quality and interoperable Metaverse destinations to make it a reality for all.

Guido Meardi is CEO of V-Nova. Guido is a former senior Partner at McKinsey, where he was head of the Organization and Operations Practices of the Mediterranean Complex. He has a breadth of business experience and access to senior executives in a variety of industries and geographies, with well-established experience in telecoms, technology, healthcare, insurance, aerospace and defense. He led transformational projects in all continents and was instrumental in setting up some of McKinsey’s own innovation-related business-building activities. Guido is also a serial entrepreneur and angel investor, with many previous business ventures and half a dozen exits. Having retained his engineering expertise, he contributed to the foundational development work for V-Nova’s core technology and is a joint inventor of a number of essential aspects of the technology as well as of several of its latest developments, with over 200 patents co-authored and filed. Guido holds an MBA from MIT Sloan, where he was a Siebel Scholar, and an M.Sc. in Computer Engineering from Politecnico di Milano, where he was an Intel scholar, and the University of Texas at Austin.