JPEG 2000 image compression

Figure 1. JPEG 2000 applications. Click here to see an enlarged diagram.

The JPEG 2000 standard, finalized in 2001, defines a new image-coding scheme using compression techniques based on wavelet technology. Its architecture is useful for many diverse applications, including Internet image distribution, security systems, digital photography and medical imaging.

A lot of confusion exists as to what JPEG 2000 is and how it compares with other compression standards, such as MPEG-2, MPEG-4 and JPEG. With brief comparisons to other compression standards, this article highlights some of the often misunderstood and rarely mentioned potential benefits of JPEG 2000.

Limited-bandwidth applications

Figure 2. An image encoded with JPEG 2000 uses the wavelet transform to break the image into subbands and offer multiple resolutions. Click here to see an enlarged diagram.

When transmitting or storing picture information, compression must be employed to maintain picture resolution while making the best use of limited channel bandwidth. Compression is defined as lossless if full recovery of the original is available from the channel without any loss of information; otherwise, it is lossy.

Standards are required to ensure interoperability. JPEG 2000 provides for both lossless and lossy compression. As such, it lends itself to applications that require high-quality images despite limitations on storage or transmission bandwidths. (See Figure 1 on page 77.)

An important feature of systems based on JPEG 2000 is the ability to extract a variety of resolutions, components, areas of interest and compression ratios from a single JPEG 2000 code stream. This is not possible with any other compression standard, because the image size, bit rate and quality must be specified on the encode side and cannot be determined or changed on the decode side.

Figure 3. One JPEG 2000 stream can be received by several decoders and extracted at different resolutions. Click here to see an enlarged diagram.

For example, a closed circuit TV (CCTV) security system can make use of this feature by sending a single JPEG 2000 code stream over a low bandwidth network. High-res images can be stored on a hard disk drive, while several low-res images are displayed on monitors. The operator on the receive side can decide what information to extract from the single code stream that is sent.

JPEG 2000 is frame accurate. Every single frame of the input is contained in the compressed format. MPEG systems, on the other hand, reduce the amount of data through temporal compression. Temporal compression does not encode each frame as a complete image, therefore MPEG compression is not frame accurate. For this reason, legal issues restrict the use of MPEG compression in some security applications. To get around this problem, security system and equipment providers have had to develop their own compression schemes — or use the highly inefficient motion JPEG (M-JPEG) compression standard — in order to provide a compressed stream that contains every single field of the original. They can now use JPEG 2000 for new designs.

Internet applications

Progressive coding, another feature of the JPEG 2000 standard, means that the bit stream can be coded in such a way as to contain less-detailed information at the beginning of the stream and more detailed information as the stream progresses. This makes it ideal for Internet and network applications — especially with large images and low bandwidths — as the image can be seen instantly on the decoding side, even with low-speed networks or image databases. The lower subbands are shown first, and more detail is added as time progresses. Therefore, the picture becomes sharper and more detailed over time, and the entire image does not have to be downloaded before it can be seen.

Figure 4. Input signal 1 containing frequencies A, B and C. Click here to see an enlarged diagram.

With the low-quality image instantly available, the user at the receiving end can decide whether to view the picture in its fully decoded version or to pass it by and scan the next picture. Clients can view images at different resolutions or quality levels (compression rates), making them suitable for any transmission bandwidth, connection speed or display device. In addition, JPEG 2000 coding provides the option to zoom in or out on a particular area of the image or to display a particular region of the image at a different resolution or compression rate.

HD applications

At extreme compression levels, JPEG 2000 video starts to blur but is still viewable. MPEG or JPEG artifacts are much more disturbing to the eye, with the picture visibly broken down into small blocks at high compression ratios. The high image quality at medium-to-high bit rates; contents that contain a lot of motion and lack block artifacts; and high efficiency make JPEG 2000 ideal for such HD applications as digital cinema, HD recording systems and HD camera equipment.

Many applications require exact bit-rate control, which only JPEG 2000 can provide. Exact bit-rate control is possible because an entire frame or field is transformed at once. It is then broken down into bit streams or code blocks that can be processed independently. This is in contrast to the alternative of breaking the frame or field into 8 × 8 pixel blocks prior to transform (as is done with DCT systems), thus making exact bit-rate control impossible. The rate-control algorithm used in JPEG 2000 truncates each bit stream to meet a specific target bit rate, adjusting the truncation and requantization of each code block's data as required.

Figure 5. Input signal 2 containing frequencies A, B and C. Click here to see an enlarged diagram.

In addition to programming the target bit rate, the standard allows the user to specify a particular quality metric. In this case, the target bit rate will vary to meet the specified quality factor, as long as the performance does not fall below a specific peak signal — to-noise ratio (PSNR). The PSNR is an indication of picture quality comparable to perceived picture quality.

JPEG 2000 code stream

A given input image or part of the image (tile) is sent to a set of wavelet filters that transform the pixel information into wavelet coefficients, which are then grouped into several subbands. (See “Web links” on page 86.) Each subband contains wavelet coefficients that describe a specific horizontal and vertical spatial frequency range of the entire original image. This means that lower-frequency, less-detailed information is contained in the first transform level, with more-detailed, higher-frequency information is contained in the higher transform levels.

For simplicity, only two levels of transform will be used for this example. The first transform level results in subbands LH1, HH1, HL1 and LL1. Only subband LL1 is passed on for further filtering, generating the next transform level and creating subbands LH2, HH2, HL2 and LL2.

Figure 6. Wavelet transform of signal 1. Click here to see an enlarged diagram.

Equally-sized code blocks, which are essentially bit streams of data, are generated within each subband. This breakdown is necessary for coefficient modeling and coding and is done on a code-block-by-code-block basis. In essence, the actual compression is achieved by truncating or requantizing the bit streams contained in each code block. These bit streams are further decimated using entropy coding known as rate-distortion optimization (RDO).

Code blocks can be accessed independently. Their bit streams are coded with four coding passes. This process, called context modeling, is used to assign information about the importance of each individual coefficient. The code blocks can then be grouped according to their significance. On the decoding side, it is then possible to extract information according to its significance, allowing the most significant information to be seen first.

JPEG 2000 can contain up to 16 layers, which are defined by RDO and context modeling. Each layer stands for a particular compression rate, where the compression rate is achieved from the quantization, rate distortion and context modeling processes.

Layer 0, for example, contains bit streams that were not truncated and contain no coding passes, and thus provide the lowest compression rate and the highest quality. Layer 16 contains bit streams from the lossy wavelet transform (WT), is requantized and ordered according to code-block significance — with the most significant information coming first — and provides the highest compression rate and the lowest quality.

Figure 7. Wavelet transform of signal 2. Click here to see an enlarged diagram.

Tiles or images are further partitioned into precincts. Precincts contain a number of code blocks and are used to facilitate access to a specific area within an image in order to process this area in a different way or to decode only a specific area of an image. Arranging code blocks or precincts into an array of packets with the lower subbands coming first generates the JPEG 2000 bit stream.

The JPEG 2000 stream starts with a main header containing such information as uncompressed image size, tile size, number of components, bit depth of components, coding style, transform levels, progression order, number of layers, code block size, wavelet filter type and quantization level. The entire image data, grouped in code blocks of LL, HL, LH and HH subbands, follows the header. Data is not contained in the header information. This format, or table of contents, can be stored on the encode side and allows a decoder to call up a certain resolution on demand, without first having to decode or download the entire JPEG 2000 code stream. (See Figures 2 and 3 on page 78.)

Efficiency

One major advantage of JPEG 2000 is that it significantly reduces the processing power and memory required for the compression and decompression processes, thus making it suitable for HD applications. JPEG 2000 uses the WT to reduce the amount of information contained in a picture, while MPEG and JPEG systems use the discrete cosine transform (DCT).

Figure 8. Fourier transform of signal 1 with four frequency components. Click here to see an enlarged diagram.

It is true that the WT requires more processing power than the DCT, but MPEG systems require more than just the DCT. The DCT, or any type of Fourier transform, expresses the signal in terms of frequency and amplitude — but only at a single instant in time. The WT transforms a signal into frequency and amplitude over time and is therefore more efficient. Figures 4 through 9 on pages 80 through 88 demonstrate this.

To obtain the same amount of information as with one WT pass, the DCT must be used for every frequency. Each of these frequencies must be transformed at each time instant for each 8 × 8 pixel block. In addition, MPEG systems use interframe compression (motion estimation) in order to reduce the amount of data further for motion estimation. This requires storage of at least two entire fields in external memory. The computation-intensive motion estimation process requires a powerful processor. Temporal compression can be used in JPEG 2000 systems, but it is not inherent in the JPEG 2000 standard.

The advantages

All MPEG standards are complex and computation-intensive. This translates into extensive processing latency and memory requirements in SD applications. These factors become even more of a problem when HD formats are considered — and JPEG 2000 becomes even more desirable.

Figure 9. Fourier transform of signal 2 with four frequency components. Click here to see an enlarged diagram.

Another strength of JPEG 2000 is the standard itself, which allows immense flexibility and control in many different applications. There is also much versatility regarding formats: It supports anything from 8 bits per sample to 14 bits per sample, whereas MPEG only supports 8-bit data.

JPEG 2000 continues to gain popularity, even though MPEG-2 is the established standard for DVD and broadcast applications. JPEG 2000 is also popular in HD applications that require high-quality storage or transmission of HD images over wireless or other links.

New silicon

One JPEG chip manufacturer, Analog Devices (ADI) has invested heavily in wavelet-compression R&D. ADI's newest wavelet codec, the ADV202, is thus far the only dedicated JPEG 2000 IC on the market. (See Figure 10 on page 90.) A complete single-chip JPEG 2000 compression/decompression IC, the ADV202 works with both HD and SD video and still images. It supports all features of the ISO/IEC15444-1 (JPEG 2000) image-compression standard except Maxshift ROI. Its patented SURFspatial ultra-efficient recursive filtering technology enables low-power, low-cost, wavelet-based compression.

Figure 10. A block diagram of Analog Devices’ ADV202, a JPEG 2000 compression/decompression IC. Click here to see an enlarged diagram.

Containing a dedicated WT engine, three entropy codecs, a RISC processor and on-board memory systems, the ADV202 provides a glueless interface to such common video standards as ITU.R.BT656, SMPTE274M and SMPTE296M. The result is that it can create a fully compliant JPEG 2000 code stream (.j2c, .jp2). It can also provide raw code-block and attribute data, allowing the host processor to have complete control over the generation and compression processes.

Even though digital signal processor (DSP) performance has improved significantly, a DSP would have to perform 20 billion instructions per second to match the performance of the ADV202 in an SD encode application.

The outlook for JPEG 2000

A major advantage of using a JPEG 2000 hardware solution is the low latency as compared with other compression schemes. Several major manufacturers of video and broadcast equipment are now implementing JPEG 2000 into their HD products, including real-time encoding and decoding systems and video servers.

The Digital Cinema Initiative recently announced that it will use JPEG 2000 as the compression method in the delivery of digital motion pictures. (See “Web links” on page 86.) Broadcasters can look for even more JPEG 2000-based equipment to appear. Its combination of low latency, multiple resolution decoding capabilities and high quality make it a good fit for these professional applications.

Christine Bako is an applications engineer at Analog Devices in Austin, TX.

Web links

The use of wavelets in encoding was first explained in Analog Dialogue 30-2 (1996); www.analog.com/library/analogdialogue/archives/30-2/wavelet.html

The Digital Cinema Initiative will use JPEG 2000 in the delivery of digital motion pictures; www.itscj.ipsj.or.jp/sc29/29w02901.pdf