Web video scalability

Scaling Internet video creates a seamless experience for the end user.
Publish date:

Today's broadband connections allow delivery of streaming video and audio to consumers in a straightforward manner and at increasingly higher resolutions. DSL and cable are available to the home, with speeds ranging from 1Mb/s up to 6Mb/s- 7Mb/s. These connections are often shared within a location and are not foolproof. The expanding market for streaming to mobile devices means encoding should scale to fit the available bandwidth of wireless as well. The Holy Grail for producers would be if one source could scale not only to 100kb/s-200kb/s for wireless, but to 3Mb/s and higher for HD delivery. This article will discuss the options, deployed and in development, that may soon make this a reality for broadcasters.

Encoding multiple bit streams

Let's start with the longest running case of scalability: simply delivering video and audio over a single, variable bandwidth connection. In the past, separate files were encoded at specific bit rates, and users would choose one based on their Internet connection. Platforms, such as RealVideo and Microsoft, evolved so that multiple bit rates were encoded into each file, and servers could determine network conditions and dynamically switch the data rate delivered accordingly.

Today's major delivery formats, including Adobe's Flash video and Microsoft's Silverlight, still adhere to this principle. H.264/AVC also has a version in its scalable video codec (SVC) extension, which can be delivered through servers from both these companies as well as in QuickTime. Move Networks, based on On2's VP6 codec, offers adaptive streaming technology with a proprietary plug-in required on the user side.

Figures 1 and 2 illustrate how the newer methods create a seamless experience for the end user. In earlier versions, shown in Figure 1, the user might choose a higher bandwidth file than they are capable of receiving and would experience a tremendous wait time before playback could begin. Even if an appropriate bit rate is chosen, when network conditions change, the stream would need to buffer, causing pauses in the playback on the user's side. Multiple bit rate streams offer a built-in buffer so that when the network conditions warrant, playback continues while the server adjusts the bit rate being sent to the client. As shown in Figure 2, if the detection and switching interval is not longer than the buffer time, continuous playback is achieved.

There are a few caveats to the encoding of multiple bit rate streams. One is that the video dimensions must remain constant. For example, a 320 × 240 frame might be encoded at 150kb/s, 300kb/s and 500kb/s. The sophistication of codecs today generally allows this without noticeable degradation, although it is more of a consideration if streaming high-motion, high-detail content such as sporting events.

Another caveat is that to simplify the process, one audio stream may be used for all the video. Again, in most cases this is appropriate because users will detect differences in audio more so than picture quality, but if details of the video are a higher priority, it may justify using a variable audio rate as well.

Another choice that the encoder needs to make is whether motion or picture detail is more important. Most often, frames are dropped first to lower bandwidth requirements. Because codecs can achieve 30fps at moderate broadband speeds, the switch to half that rate could be achieved first by dropping every other frame. Picture quality within the frames would also be compromised somewhat because key frames (or I-frames in MPEG jargon) that use proportionately more bits in the encoded stream would not be dropped. If more than just half the frames need to be dropped in order to achieve an acceptable bit rate for delivery, motion rendition could become jerky and more noticeable to the end user.

Frame size, frame rate and audio fidelity all come into play even more when the goal is to achieve multiplatform delivery. Bit rates for mobile delivery are still much lower than DSL and cable modem, and also tend to be much more variable. More sacrifices in picture quality may be necessary, but thankfully, they are much less noticeable on mobile devices' smaller screens, with typical resolutions of 176 × 144. This is also a case where audio fidelity may play a much larger role in offering the end user a quality experience.

Simultaneous conversion

The push today is for simultaneous conversion from source to multiple streams that deliver a wide range of video and audio quality depending on the consumers' choice of platform and network connectivity. Current technology can offer HD video delivery at 2Mb/s-3Mb/s all the way down to 100kb/s-200kb/s streams to mobile devices, and everything in between. The process in a studio or encoder farm is much more complicated than producing a single bit rate stream. (See Figure 3.) HD sources can be encoded for network delivery, while incorporating a multiple bit rate strategy, as described above, for delivery over high throughput connections to different devices in the home. CPU overload can become an issue when streaming HD, but home networks can also deliver HD and SD content to set-top boxes. After first downconverting the HD video to SD, separate multiple bit rate streams would be encoded to lower bit rates for DSL, cable and wireless delivery.

In the production facility, careful preparation of source video is always a top priority in order to achieve the highest quality compressed streams. Proper downconversion is critical when considering compression for network delivery. Artifacts from 3:2 pulldown, deinterlacing and spatial subsampling will be accentuated at lower data rates. Even if done without noticeable artifacts, these processes can create a more difficult signal to compress due to higher local and frame-to-frame differences in the video. This means bits are allocated to “artificial” picture details, which can compromise quality in the final decoded image.

In order to achieve full implementation of scalable video over networks, the infrastructure must be in place across the entire production and delivery path: broadcast facilities, content distribution networks and local connections to receiving devices. Any format choice, whether it's from Microsoft, Move Networks, Adobe's Flash or the standard MPEG-4 SVC, will require that servers pushing streams from encoding facilities to the CDN and CDN to client have the appropriate technology in place. In order to reach the widest audience, multiple formats can be produced, but this doubles or triples the infrastructure for network delivery.

Client devices must also be equipped with the proper player platform. This is typically not an issue for PCs because they are implemented in software, but STBs and wireless devices generally need this requirement in hardware.

While H.264/AVC may seem like the candidate of choice for scalable video delivery, proprietary formats and implementations will continue to push the technology envelope, but we do not have to anticipate another format war in the marketplace. Broadcasters can support all formats, either internally or through third-party providers, so that consumers can make the ultimate choice of how they wish to view online content.

Barb Roeder is a consultant and president of BarbWired.

Send questions and comments to:don.markley@penton.com