Content conversion terminology

Moving further into the content processing chain, after content has been delivered to a conversion device and is properly buffered, processing for each delivery channel begins. Because many terms used to describe different aspects of the conversion process are often used interchangeably, it is necessary to define their meanings.

Format describes presentation. Video format pertains to the presentation of video content. In this discussion, it will apply to combinations of pixel grid dimensions, aspect ratio, refresh rate and scanning method; 1080i60 and 1080p60 are different video formats. Audio format describes the number of speakers — 5.1 surround or 2.0 stereo, for example.

Channel capacity is the bit rate that can be reliably delivered over a distribution technology. It is expressed in bits per second, denoted b/s. Generally, the rates are in kilo-, mega- and gigabits per second, expressed as Kb/s, Mb/s and Gb/s, respectively; terabit (Tb/s) channel capacities loom on the technological horizon.

These two system characteristics, presentation format and channel capacity, directly influence the content conversion process. Most importantly, each is a major factor in determining the bit rate, and therefore the quality, of distributed content.

Three-screen scenario

The displays on consumer content consumption devices can be thought of as consisting of three classes: passive, active and personal.

TV screens and multichannel audio, used mostly for entertainment purposes, are enjoyed from a passive, laid-back position — the traditional TV program consumption environment.

Internet delivery to a PC is an active, participatory user experience. Screens usually smaller than televisions, where users interactively consume content, often while multitasking, are another class of displays. Audio may be multichannel, but is usually of a lesser quality than a home theater system.

Small personal communication devices, such as cell phones, mobile video devices and PDAs, have significantly smaller displays and reduced audio capabilities.

The challenge for multiplatform broadcasters is to produce and convert content so that it is compelling for each of these screen scenarios. The content consumers want their content anywhere, any place, at any time. So, to build brand loyalty, differentiating presentation is key to securing sufficient audience numbers to support a profitable business model.

Conversion terminology

Confusion also abounds when describing content conversion and/or transformation.

Format conversion is any change in audio/visual presentation. For video, this is any change in combination of display pixel grid dimensions, aspect ratio, scanning method or refresh rate. In a scenario where content was produced for presentation on a large screen in 720p with 5.1 audio, then downconverted to QVGA with stereo audio (320 x2 40, 2.0) for Internet streaming and CIF with mono for cell phones, the source audio and video formats would be converted twice.

Because media is transferred in digital form, it has an associated data rate, bits per second (b/s). Transrating is changing the bit rate but maintaining the same compression codec. For example, an SD program that is produced and edited as MPEG-2 @ 40Mb/s is transrated to MPEG-2 @ 25Mb/s for contribution, and then transrated again to 4Mb/s for multiplexing into an MPEG transport stream for a DTV broadcast.

Although MPEG-2 compression is required for ATSC terrestrial DTV broadcasts, use of AVC, VC-1, AC-3, MP3 and other codecs are available in various stages of the content production and distribution chain. Transcoding is the converting of content from one compression methodology to another. MPEG-2 to MPEG-4 and AC-3 to MP3 are examples of transcoding processes.

It is important to clearly understand the meaning of each of these terms and to apply them properly and consistently. Some technologists will describe the process of converting HD to SD as a downconversion. Others will use the term downconversion to describe a bit rate reduction, which should properly be called a transrating.

Quality vs. quantity

The quality of the video and audio produced by an encoding or conversion device is dependent on the number of bits that are produced per second. Scene complexity (more detail and more motion) and sonic complexity (more sound, random tonality) take more bits to compress so the information loss is below the level of consumer perception. Using too few bits may produce perceptible artifacts.

For MPEG video encoders, at least in theory, as many bits as are necessary are used to compress a given frame as required, because each scene varies in spatial complexity and the difference between frames varies; hence, there is a variable number of bits in a given frame. On the other hand, audio is often coded using a constant number of bits over a defined time period.

Variable bit rate (VBR) encoders produce bursts of data at changing rates. Data produced is scene dependent. A complex scene needs a maximum number of bits, simple scenes, significantly less. There is no direct way to calculate a frame boundary. Buffer design is complicated because data quantities vary over time. A target buffer in the decoder places a constraint on an encoding device and manages buffer use. This use of VBR encoding produces the highest quality content when decoded.

The ATSC implementation of the MPEG transport stream specification constrains the data rate to a fixed 19.39Mb/s, a constant bit rate, yet there is confusion as to the use of the phrase.

Constant bit rate (CBR) is when an encoder produces a constant data rate bit stream. This does not mean each video frame has a constant number of bits. There is no way to know where a frame boundary occurs in the bit stream by using a simple mathematical relationship. The bit stream must be parsed and analyzed to determine frame boundaries. One of the drawbacks to CBR is that complex content that requires bit rates that exceed the limit rate will suffer quality loss. On the positive side, CBR simplifies buffer design.

Another technique tries to leverage characteristics of both VBR and CBR. Capped variable bit rate (CVBR) uses VBR encoding whenever possible, but when scene complexity requires more bits than are available, a maximum limit caps the bit rate. This may produce artifacts, yet the probability of the need for more bits than are available is relatively low. CVBR can be used in multicast scenarios to insure that an HD program attains full HD quality the majority of time, and does not impact simulcast SD programs. Additionally, during scenes of low complexity, opportunistic data can be delivered for enhanced DTV services.

Conversion workflows

Converting source material from its original format to multiple formats requires careful system design. Each process, format conversion, transrating and transcoding, should occur in steps that produce the optimal workflow and highest audio/visual quality for each distribution channel and consumption device.

An analysis of all conversions that must be done to support multichannel delivery needs to be performed to determine the optimal infrastructure, workflow and process sequence. This is not a simple task. Each delivery channel will have a different compression/encoding requirement. In one scenario, a single conversion device that produces multiple formats may be the best solution. On the other hand, the highest quality may be attained by parallel conversion workflows.

The next Transition to Digital will investigate how audio/visual presentation formats influence format conversion scenarios.