Can Video Compression Tame the Internet?

LOS ANGELES—NEWS FLASH: There is a huge shift in viewing of media (TV and movies) from traditional distribution outlets (cable, satellite, broadcast) to the internet, including both fixed and wireless as well as mobile devices…OK, we all know this, just wanted to get your attention.

While the established modes of media distribution are well understood and engineered, governed by open standards (ATSC A/53, SCTE 23 (Docsis), DVB-T, DVB-S, MPEG) the internet is an amalgam of “open standards” (IETF) and quasi proprietary approaches (Apple HLS), as well as a variety of software protocols and applications (web browsers, media players, etc.). Further there is a loose form of registration (ICANN) and operational principles (peering) that govern traffic and routing on the internet.

This looseness, with a minimal of standardization, has encouraged the development of new feature-rich applications that empower people to consume media content anywhere and anytime they have access to the internet. On the minus side, the “core internet,” which has evolved over time, is not designed to expand at the rate of consumption of video. Unlike other forms of internet traffic, media streaming/downloading requires certain performance criteria:

1. Low latency

2. High end to end bandwidth

3. Low packet loss

4. High storage capacity

ONE TO ONE

One of the key differences between the internet and traditional video distribution is that the internet is based on datagram transmission protocols, not switched circuit technology nor linear channel distribution (cable, OTA, satellite). In datagram transmission protocols, the video content is put into sequential packets of limited size and then each packet is launched over an IP network. Each request for content requires unique packets be sent to that user, thus 1 million viewers means 1 million distinct video streams. Linear channels can support multiple receive points with one common video feed.

Along the way, these packets are buffered and routed to their ultimate destination. Because there is no centralized routing or path assignments, the route of any two packets of the same video stream may or may not traverse the same set of routers and links thus experiencing different transit delays. The IP packet video receiver has to buffer the incoming packets, perhaps re-ordering them, prior to processing (decoding). This buffering introduces additional delay. In contrast, in a linear channel or circuit-switched transmission, the routing of video data is fixed thus the received video packet order is fixed, minimizing buffering and delay. There may be an initial delay in setting up the circuit or tuning to the desired channel, but once the streaming has begun, a minimum of delay will occur from receiving the compressed video data and decoding back to base band image data.

THREE METHODS OF SEGMENTATION

To overcome some of the challenges of data packet transmission, a technique known as segmentation has evolved. There are three competing approaches: Microsoft Silverlight, Apple HLS and Adobe HDS. Basically, the media is encoded in bundles of discrete segments of content length (from milliseconds to several seconds usually) and are then sent as a “package.” Once the first package is received and verified, decoding starts while the next package is being received. As long as the transmission delays are less than the segment content length, the receive buffer is kept from emptying, which prevents stalling the decoder output.

However this only solves the delay/re-buffer problem. If the end-to-end IP bandwidth is not stable and drops below the criteria of delivering packets faster than the decoder is pulling them out, then buffer underflow will occur causing freeze frames or black frame output. An advantage of the internet is the inherent bi-directional connection; the receiver can report back to the video server statistics on the recovery of packets. If the packet receive rate is too slow for the current encoding bitrate, the server can “switch” to a lower bitrate. For the segmented protocols, this usually means swapping bit rates at segment boundaries. Likewise, if there is more than sufficient bitrate, the server can raise the bitrate and improve the overall video quality. For circuit-switched or linear channel-based delivery, fixed bandwidth is guaranteed along with a fixed transmission path.

From a practical point of view, given that encoding is processor-intensive and each bitrate rendition require more storage at the server, video servers have a fixed number of encoded bit rates available, typically four. The server selects the appropriate bitrate segment for the viewer encoded in the bitrate below the reported receive bitrate. A further refinement of this approach, called CAE (Content Aware Encoding) optimizes the bitrate per segment, with multiple tiers of quality performance. For a given quality, the bitrate will fluctuate between segments; if the available bitrate is insufficient, the server will select a tier (or ladder) of less quality/lower bitrate.

Other techniques to manage video delivery over the internet include transcoding at the edge (where the origin video is re-encoded to fit down local IP connections), multicast (similar to a linear channel approach), shared caching (where receivers share data streams) and peering (similar to Bit Torrent).

HANDLING THE CROWDS

The reality of internet delivery of media content is a complex topic. There are physical constraints, such as bandwidth, storage, processing, as well as electrical power (it is estimated that in the U.K., 16 percent of all power generated is used by the internet data centers). While there are traffic flow models for the internet, media streaming does not fit well into these models, so we have to use empirical measures of how well the internet is handling video. We know many “peak event” viewings of live content (season premieres, live sporting events, etc.) cause either slowdowns or disruptions to the video feeds. The causes are manifold—maybe over-subscription of the origin server or if the client side connectivity degrades or the internet core routing is over-taxed or if the ISP gateway is overloaded. CDNs in part, mitigate these peak flows and attempt to route around bottlenecks. But the root cause is larger audience sizes coupled with increasing bandwidth requirements of video content (HD, 4K, high frame rate). Since each viewer receives a unique media stream, as the viewership grows, the total internet bandwidth grows.

The current estimate is that around 70 percent of all internet traffic is due to media consumption. This means that everything else (email, financial transactions, web browsing, etc.) is 30 percent. However, it is interesting to note that the 70 percent represents many duplicative feeds—if 1 million people are interested in viewing the World Champion Darts competition finals, then that represents 1 million streams at some bitrate (1Mb/s to 4 Mb/s typically). But at the same time, there can be many other instances of stream viewing with an audience of a single viewer to over 1 billion viewers. While CDNs can scale and create multiple delivery pipes—and multiple origin servers can be made available, thus spreading the load over a broader set of servers and data connections—there is ultimately a finite resource of bandwidth and processing nodes.

THE BOTTLENECK

Much like the traffic on the 405 in Los Angeles, there are finite resources (lanes) that can carry cars. We can carpool or reduce the size of the lanes (compression) but this solution has a linear effect on the problem while the growth of internet video traffic is exponential.

Another way to look at this is Compound Annual Growth Rate (CAGR). Video streaming is growing at a CAGR of 30-35 percent; video compression over the last 25 years has improved around 7 percent CAGR (halving the video bitrate every 10 years). Unless video compression has a major breakthrough (15:1 improvement in compression efficiency) compression alone cannot solve the internet video bottleneck. (Note that 15:1 compression improvement does not take account of the increase in bitrate due to increasing video format size or frame rate.) More and more content delivered over the internet is HD and above (4K, eventually 8K), which needs more bandwidth than the current mix of internet video streams.

The solution has to be found in alternate video protocols for distribution of video over IP. The 1:1 relationship between the viewer and a unique video stream is unsupportable with the growth of consumption of video over the internet. The challenge is the desire for personalized delivery, one that can be controlled by the viewer using many different devices (mobile, fixed, PC), connection types (wired and wireless) and viewing conditions. Some proposed solutions are variants of multicast protocols, but multicast means “appointment TV” viewing or having to record each multicast feed locally in an appliance. However this does not support “impulse viewing” if that multicast feed was not previously subscribed to by the viewer. Perhaps a hybrid approach where there is a “linear feed” multicast to millions and 1:1 individual streams for those that either come late to the party or want a unique viewing experience (such as a “cut down” version of a live event, minus all the game breaks). Maybe.

Well let’s hope that 2018 will be the year that someone figures out a solution to the bottleneck.