The marriage of broadcast and IT: Compression algorithms

If it weren’t for the requirement that HDTV be a terrestrial service that would fit in a 6MHz channel, the transition to digital may never have begun. Early analog augmentation channel NTSC compatible HD systems might have made it through the FCC testing process, and by 1990 over the air broadcasts might have started. Cable may have been unwilling to give up bandwidth causing it to miss the HD boat and lose viewers.

Political maneuvering delays in the HDTV standardization process by U.S. proponents opened the door for GI’s compression breakthrough and a simulcast all-digital system was the new FCC decree. Still, after the first round of ATTC testing draw, the technical architecture of the Grand Alliance may not have been MPEG based. The ATRC system touted MPEG, and Philips insisted that it be the system requirement before joining the Grand Alliance.

Selecting an audio was a shootout during the first round of ATTC testing between Musicam and AC-3. Musicam testing was questioned because the codec had technical problems. AC-3 was then adopted, and Musicam was the backup in spite of the ATRC attempts to lobby for retesting.

If you don’t see it or hear it…

CD technology and computer displays have opened the doors to a higher sensual audio and video consciousness.

Decades of studying human perception have taught researchers what is perceived and what is not perceived. For example, our eyes can only separate detail at a given viewing distance according to well-known mathematical relationships. Similarly, our ears can hear notes only if they differ by a certain frequency. There are many other known characteristics of aural and visual perception that are exploited by compression algorithms.

Of primary importance to perceptual compression algorithms, is what we cannot hear or see. If you don’t hear it or see it, it isn’t there. Hence it need not be present in an audio or video signal for it to be presented as a convincing reality. Unperceived information can be removed and when the sound and image is reconstructed, it will stimulate our senses as virtually a reality.

Interlaced scanning, audio bandwidth limiting and vestigial sideband modulation techniques were a form of compression used to fit the NTSC system into a 6MHz channel. For 40 years, this level of perceptual fidelity was good enough to satisfy audiences.

Visual redundancy

MPEG-2 compression is described by two attributes: level and profiles. Profiles are an indication of source complexity; denote chroma sampling; and can be simple, main, 4:2:2, SNR, spatial and high. Level specifies picture size in pixels as low, main, high-1440 and high. Not all levels are supported at all profiles. SD is referred to as MP@ML. A 525-line MP@ML has a maximum of 15Mb/s with pixel formats of 720x480 at 60fps. HD at MP@HL has a maximum 80Mb/s and a picture size of 1920x1152.

A fundamental principle of MPEG video compression is that successive video frames/fields usually change very little. If a complete frame an Intra-frame (I) is coded, a subsequent frame can be represented by similarities and differences producing a reduction in information.

Forward Predictive (P) frames occur after an I frame and refer back to an I frame for reconstruction. Bidirectional predictive (B) frames reference I and P frames in both forward and backward temporal directions, thereby increasing compression ratios. Group of Pictures (GOP) defines the repeating sequence of I, P and B frames.

Various color space sub-sampling, discrete cosine transform (DCT) conversion, Huffman or variable length coding (VLC) and run length coding (RLC) techniques complete the compression process.

In a Broadcast Engineering article, “Video Compression,” Michael Robin explains the MPEG-2 compression process for an SD video source. The same methodology also applies to HD. ATSC Standard A/53 devotes sections to compression implementation but the definitive reference is ISO/IEC 13818-2 also known as ITU-T H.262.

Audio adaptation

AC-3 audio compression is based on a psychoacoustic perceptual model of what the ear can and cannot hear. All other audio components are removed or ignored during compression encoding. An adaptive bit allocation technique is used rationing bits where they are needed for complex audio passages while using only the minimum required bit resolution for fidelity when possible.

Similar to MPEG video levels and profiles, AC-3 defines two classes of service: main and associated. The combination of services determines the bit rate. Main services can be up to 5.1 channels while associated services, such as voice over and visually impaired are meant to complement a main service.

A frame is the basic unit of an AC-3 stream. It consists of a sync byte, bit specific information, five audio blocks, auxiliary data and a CRC for a total length of 1536 bytes. This represents compression of a 5.1 PCM 5.184Mb/s (6 channels x 48KHz x 18 bits) to 384kb/s.

Encoding begins with an analysis filter bank that produces frequency coefficients. Spectral envelop encoding drives both the bit allocation and quantization procedures. Similarities are extracted from the left and right surround channels and are coded only once.

A complete description of AC-3 compression can be found in chapter 7 of DTV Audio Encoding and Decoding. For a complete specification, refer to ATSC Standard A/52.

Artifacts

Removing this much unperceived information from a sound or an image is based on the premise that you will compress it and then reconstruct it for presentation only once. Only those with golden eyes and ear will notice any difference from the original. But after more than one cycle, artifacts become annoyingly audible or visible.

In the compression process, information is lost in the analog to digital and digital to analog conversion. The conversion produces round off errors that accumulate with coding decoding generations. Eventually these are noticeable. Subjective testing has established when and how artifacts create a Just Noticeable Difference (JND).

Difficult to code images and sounds will tax the compression engines. In the visual realm, high detail and rapid motion are compression engine killers. For complex or corrupted audio, artifacts can be silence, clicks, distortion and/or echoes.

Bigger pipes and smaller packages

Ten years have elapsed since the ATSC prototype system specifications were demonstrated. As broadcasting evolves to a multi-platform media delivery, more efficient coding algorithms will enable new business models and enhanced aesthetic presentation. Moving beyond current delivery mechanisms, larger pipes will spread the wealth of media availability to a wider audience.

The ATSC is considering revising the A/53, AVC/H.264 and VC-1 compression algorithms, and Annex G: High Efficiency Audio System Characteristics for A/52.

Both of the original ATSC audio and video compression algorithms have served admirably in making HDTV a reality. Video and audio compression is crucial to the ATSC process of producing 19.38Mb/s of datastream that fit into a 6MHz TV channel.

Resources

Video compression, By Michael Robin, Broadcast Engineering, 2002 http://broadcastengineering.com/aps/infrastructure/broadcasting_video_compression/

DTV The Revolution in Electronic Imaging, By Jerry Whitaker, McGraw-Hill ISBN 0-07-137170-2

Standards

ATSC Standard A/53C with Amendment No. 1 and Corrigendum No. 1: ATSC Digital Television Standard, Rev. C, 21 May 2004, (Amendment No. 1 dated 13 July 2004, Corrigendum No. 1 dated 23 March 2005) www.atsc.org/standards/a_53c_amend-1_corr-1.pdf

ISO/IEC 13818-2 GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIO Recommendation H.262 www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=31539

ATSC Standard A/52A: Digital Audio Compression (AC-3) Standard, Rev. A , 20 August 2001 www.atsc.org/standards/a_52a.pdf

Back to the top