Scaling Up for Scalable Video Coding
April 5, 2011
Delivering high-quality video over a low quality IP network can be challenging. As described previously in this column, adding Forward Error Correction (FEC) data can allow a receiver to reconstruct lost packets using extra information sent along with the stream, but it can be costly in terms of the extra bandwidth consumed. What's needed is another way to construct a stream that can better withstand data losses and still produce a video output that is acceptable to viewers.
Scalable Video Coding (SVC), defined in Annex G of H.264, allows an encoder to create a single output stream that contains multiple layers of compressed image data. These layers are constructed to allow one or more of the layers to be removed in transit (due to error or bandwidth reduction) and still create an acceptable image. Since its formal release in 2007, this standard has been used to create videoconferencing and security video products. Due to the increasing need to transmit broadcast video signals over public wireless and Internet networks, SVC should be a strong candidate for broadcast markets.
Fig. 1: SVC Layers
As shown in Fig. 1, an SVC stream consists of multiple layers of compressed video data. The "base layer" contains the most basic, scaled-down image data needed to reconstruct the video. The remaining compressed video data is grouped into "enhancement layers," which contain information that the decoder can use to scale up and produce a higher-quality image. If only the base layer is received, a decoder can produce a video output, although the quality will fall short of the image that could be produced with the addition of enhancement layers.
H.264 SVC can scale a video stream along three different dimensions: Temporal, Spatial and SNR/Quality. Each of these scaling dimensions impacts the compressed video stream in a different way:
Temporal scalability refers to removing some of the video frames from the stream. This could, for example, reduce the video signal image rate from 30 fps to 15 fps, or even lower, to create the base layer.
Spatial scalability refers to reduction in image size. This could, for example, take an HD video signal and reduce it to SD to create the base layer.
SNR/Quality scalability refers to sending less detail for each part of the image. This could result in the base layer video having less well-defined edges between picture elements.
Scaling back along these dimensions can substantially reduce the bit-rate required to transport the base layer as compared to the bandwidth needed for the entire compressed signal.
When the SVC layers are sent over a network, special emphasis can be placed on ensuring that the base layer is delivered to the decoder. Two of the most popular ways of doing this are through the use of FEC or through Automatic Repeat reQuest (ARQ). When FEC is used, the packets that contain the base layer are given more robust FEC patterns that protect against longer burst losses and/or against higher amounts of random packet loss. ARQ operates by allowing the decoder to request retransmission of packets that it has not received. When the sending device gets a request to re-send data, it can make sure that base layer packets are given priority over enhancement layer packets.
For broadcast TV and many other applications, the decoder's output must match the frame rate and resolution of the source video signal. If the base and enhancement streams are successfully delivered, then a decoder can easily reconstruct the original signal. If, on the other hand, only the base level stream is delivered, then a decoder must still generate a full size, full frame rate image by expanding the available data to conceal any missing picture elements. This can sometimes result in image artifacts, but overall the impact on viewers can be minimized.
When all the enhancement streams are delivered, the quality of a decoder's output is essentially the same for a standard H.264 AVC signal as it is for a H.264 SVC. Due to the extra information required in the SVC format (such as control data indicating which elements belong to the base versus the enhancement layers), an SVC stream can consume up to 10 percent more total bandwidth than a comparable AVC stream.
Overall, SVC really shines when errors are present in the link. With AVC compression, uncorrected error rates above 0.1 percent can create detectable visual artifacts, and uncorrected error rates above 1 percent are generally catastrophic. With SVC signals, error rates below 1 percent have essentially no impact on the video output quality, and error rates up to 10 percent and higher can be accommodated with little apparent effect on the delivered signal. As wireless links become more crowded, and Internet traffic continues its inexorable growth, H.264 SVC could become a valuable tool.
Thanks to Jill Boyce, director of Algorithms, and Jim O'Brien, general manager with Vidyo Inc. for their assistance with this article. To see SVC in action, please visit www.vidyo.com/resources/videos.php.
Wes Simpson is an industry consultant and author of "Video Over IP, Second Edition," from Focal Press. Your comments are welcome to email@example.com.