Back in 1988, the first work began on video compression standards that would ultimately result in MPEG. The Motion Picture Experts Group, in cooperation with the International Organization for Standardization (ISO), created multiple standards for video compression, now known as MPEG-1, MPEG-2 and MPEG-4/AVC. MPEG was devised as a means of compressing broadband video into a small bit stream that could fit in extremely narrow broadcast or satellite transmission channels. As such, it is well-suited for distribution purposes, not for applications that require internal switching and processing.
Because MPEG is a motion image compression technology, it works on a sequence of video frames, known as a group of pictures (GOP). (See Figure 1.) A processor examines several frames of video and assigns one frame as the reference frame for that group (the I-frame). The GOP also has several predictive frames, or P frames, which use information from the I frame and previous P frames to construct images. Finally, there are bidirectional predictive frames, or B frames, which look at preceding and following I and P frames. Motion is analyzed and the motion vectors, which predict the offset from the current frame to the reference frame, are estimated down to a quarter pixel. The motion vectors and the difference between the actual images are used to reconstruct video that looks good at low bit rates.
Problems with MPEG encoding arise when there is a scene change or where a large percentage of the image changes at once. The processor can’t predict the movement, and the system creates a distortion known as macroblocking. (See Figure 2.) Raising the bit rate adds more detail in each 8 x 8 block of pixels, which can reduce the number of P frames. But there will always be P frames — and the potential for macroblocking — in all implementations of MPEG.
Another major failing of MPEG compression is that it is difficult to edit and switch cleanly. With P and B frames, editing systems and switchers have trouble finding a clean frame on which to edit. MPEG also doesn’t react well to being repeatedly encoded and decoded. It works best when the video is encoded once and decoded at the viewing location.
Meanwhile, JPEG — short for Joint Photographic Experts Group — was originally developed as a compression standard for still images. With modern processors, however, it is now possible to take a video signal and save each frame as a separate JPEG file, creating a video format known as Motion-JPEG.
Several years after the JPEG team created its first compression scheme, it reconvened to develop an even more efficient and flexible system. Released in 2000, its new JPEG compression was called JPEG 2000. The original JPEG used discrete cosine transform (DCT), the same techniques used in MPEG, to reduce data into blocks of 8 x 8 pixels. JPEG 2000 uses discrete wavelet transform (DWT), which performs simultaneous multiresolution image analysis.
The resulting wavelet coefficients are gathered in sub-bands, each of which is partitioned into small “code blocks” and independently coded by an adaptive binary arithmetic encoder. Finally, the output of the arithmetic encoder is organized as a compressed bit stream that offers a significant degree of flexibility.
Wavelet compression provides the same processing for all pixels, and with each video frame being compressed individually, there is no error propagation from one picture to the next. As a result, there is no macroblocking at low bit rates; instead, there is blurring on the images. JPEG works on a single image, stripping away redundant data and encoding each video frame independently for consistent high-quality images. As a result, editing may be performed on any frame in a sequence of images because each video frame contains its own picture information.
With no dependency on other images, JPEG 2000 has low latency, less than 1.5 frames encode and/or decode, and maintains sync between the video and audio. In contrast, MPEG and other compression schemes that rely on predictive frames and motion estimation algorithms have high latency.
JPEG 2000 and its underlying wavelet compression can allocate 10 bits or even 12 bits at 4:4:4 quality, a level in line with the demands of digital cinema post production. In fact, there are several systems that use wavelet compression techniques similar to JPEG 2000. REDCODE and CineForm, for example, are used successfully for digital cinema production, which demonstrates the quality and robustness of wavelet compression.
There are significant differences between MPEG-4 and JPEG 2000 for broadcast contribution. (See Table 1.) Further development of video and still image compression standards continues, and there will no doubt be future advancements for both standards. For example, High Efficiency Video Coding (HEVC), unofficially called H.265, promises improved video quality and data compression, as well as support for higher resolutions.
Broadcast contribution links seek to have the highest video quality because the signal may have to go through post-production editing. The output video quality from the contribution stage is propagated to the distribution link. As a consequence, the quality and efficiency from the distribution link is highly dependent upon the level of quality experienced during contribution. Broadcast contribution signals should be lightly compressed — almost equal to the original uncompressed signal — so there is minimal quality loss under multiple generations of encoding.
JPEG 2000 can compress HD video to a significantly lower bit rate, and then extract the video with no loss of information. When peak signal to noise ratio (PSNR) performance is used to compare original footage with video compressed with JPEG 2000, the degradation is close to visually lossless and deviation from the original is low. The PSNR of footage encoded with MPEG, however, can vary greatly from picture to picture due to different methods of prediction.
Recently, the performance of JPEG 2000 and MPEG-4/AVC under multigeneration encoding was evaluated using a single vendor’s encoders and decoders. (See Figure 3.) Video quality was measured in terms of PSNR for different HD interlaced sequences at the operating points of the encoders. The streams were encoded at 90Mb/s for JPEG 2000 and 45Mb/s for MPEG-4/AVC streams. These bit rates were used throughout the testing because they are common operating points.
MPEG-4/AVC provided high-quality video at 45Mb/s. Under multigeneration encoding and decoding, the PSNR reduction was sharp — roughly 2.2dB after the fourth generation. (See Figure 4.) JPEG 2000 also delivered high-quality video (at 90Mb/s), but it had less than 0.5dB loss after the fourth generation of encoding and decoding, which was less than one quarter of the reduction experienced by MPEG-4. In addition, the predictive coding of MPEG resulted in an end-to-end latency of approximately 1 second, which is roughly 10 times more than the latency experienced with JPEG 2000.
For another round of tests, designed to evaluate the influence of contribution encoding on end-user quality, a distribution encoder was used to encode the incoming signal using 4:2:0 8-bit at 6Mb/s, 8Mb/s and 10Mb/s (common settings for IPTV applications). The tests showed that because of JPEG 2000’s high video quality, it is possible to increase the compression ratio for the final compression to broadcast without quality loss, saving from 5 percent to 15 percent of the total bit rate, depending on the final bit rate used for distribution. By spending more money on contribution, it is possible to save on broadcasting to the home, because more content can be placed in the available bandwidth.
To illustrate the bandwidth savings that can be achieved when employing JPEG 2000 video compression, consider an uncompressed HD-SDI signal, which has a bit rate of about 1.5Gb/s. (See Table 2.) A JPEG 2000 encoder/decoder can maintain the quality of the original signal with a bit stream of 500Mb/s to 800Mb/s, depending on the content. At 600Mb/s, virtually any type of high-quality HD content will be transported mathematically losslessly with a JPEG 2000 system. Reducing the bit rate from there will increase the possibility of mathematical image loss, but any errors will be visually imperceptible even at far lower bit rates.
Bandwidth-intensive formats like 3-D TV raise the bar even higher for quality and precision. The creation of 3-D video and particularly the need for the best possible picture quality has ramifications for how content is captured, produced and broadcast. In the contribution phase, left- and right-eye information must be transported along with other metadata required to generate the highest quality 3-D images and must be compatible with all 3-D compression and display systems. To ensure a high-quality viewing experience, it is critical that all the data required to display the left- and right-eye images correctly is conveyed throughout the transport chain with minimal visual impairment.
JPEG 2000 ensures that the horizontal resolution, critical to the reconstruction of 3-D images in our brains, is preserved. A JPEG 2000 encoder can transport two 2K or HD-SDI channels, or the two synchronous left-eye/right-eye video streams that comprise stereo 3-D. Even after the JPEG 2000 video is compressed or decompressed, and throughout real-time transport over Metro Ethernet (metropolitan-area Ethernet), IP-MPLS (multiprotocol label switching) or SDH/SONET (synchronous digital hierarchy or synchronous optical network), the video quality stays true to the original imagery.
JPEG 2000 video compression is ideal for venues where high-quality programming regularly originates, such as stadiums, arenas and remote studios. While prices vary between service providers, commonly available IP links that provide high bit rates are universally less expensive than legacy systems, which translates to significant long-term operational cost savings. Before connecting an encoder, however, it is critical that the service provider understands the quality of service (QoS) requirements of high-quality video and can ensure sufficient bandwidth.
JPEG 2000 requires a high-bit-rate IP network and usually works best with both a primary and redundant link. If you need to do a remote shoot from a non-fixed location, i.e., in front of a burning house, then you probably need to use a microwave or satellite truck with MPEG compression.
Final signal distribution to viewers is a perfect application for MPEG. JPEG 2000 excels as a contribution technology, but is not intended as a distribution technology. That said, stadiums, city halls, remote studios, OB trucks and fixed locations where you need the highest quality video are good choices to consider using JPEG 2000 to maintain optimal video quality. As the cost of dedicated IP networks comes down and bit rates increase, JPEG 2000 video will help control costs when compared to traditional high-end video transport systems. Careful consideration of your outside broadcast requirements, access to high-speed IP networks and JPEG 2000 capabilities provide another choice in your video transport toolkit.
—Dr. Helge Stephansen is chief technology officer at T-VIPS.