Skip to main content

Media wrappers

Broadcasting today relies on the ingest, storage and playout of content involving many different tape- and file-based media. With the migration toward digital media, numerous media container systems are now in use as well. An overview of these different media packaging standards will make repurposing of content to different fixed, mobile and Web-based devices a more manageable task.

Containers facilitate handling of multipurposed content

A media container is essentially a “wrapper” that contains video, audio and data elements, and can function as a file entity or as an encapsulation method for a live stream. Each of the various containers used today is based on a particular specification that describes how the media elements are defined within it. While containers usually do not describe how data or metadata is encoded, specific containers will often constrain the types of video and audio contained within, often excluding other types. Containers can be assembled offline and then stored as finite computer files. They can also be generated on the fly and transmitted to real-time receiving devices. A real-time container is thus an open-ended stream, not necessarily intended for storage. Receiving devices, however, can capture such a stream and store it as a file.

The most familiar container to digital broadcasters is the MPEG transport stream, which almost always contains an MPEG video stream and some form of audio stream, such as Dolby Digital in ATSC systems and MPEG audio in DVB systems. The MPEG transport stream is comprised of various layers, enabling media players (i.e., TVs, etc.) to quickly parse the stream and select the desired elements. This enables decoders to easily separate out video and audio, and data elements such as EPG. Transport streams are also assembled in such a way that multiple programs can be easily and separately accessed. Specific elements of the transport stream include packets, containing the elementary data; PIDs (packet IDs), which can address elements such as elementary streams; programs; and program specific information (PSI). Null packets are also inserted into transport streams to satisfy bit rate requirements of the stream.

Alternate containers emerge

When digital video became viable on consumer devices, the rapid availability of sizeable storage on PCs meant that media could be stored locally within files. Various container providers thus emerged in a competitive fashion, each with its own set of compatibilities (or not).

Microsoft Advanced Systems Format (ASF) is an extensible file format comprising one or more digital media streams; the most common file types contained within an ASF file are Windows Media Audio (WMA) and Windows Media Video (WMV). ASF files are logically composed of three types of top-level objects: a header object, a data object and the index object(s). The header object contains a recognizable start sequence identifying the file (or stream) and generally contains metadata about the file. The data object contains all of the digital media data, which can be defined as having a stream property or a file property. The index object can contain time- and/or frame-based content indices pointing to locations within the digital media. Although ASF files may be edited, ASF is specifically designed for streaming and/or local playback.

Apple QuickTime (QT or MOV) is a container format that encapsulates one or more tracks, each of which stores audio, video, effects or text. Each track contains either a media stream or a data reference to a media stream located in another file, and these tracks are arranged in a hierarchical data structure consisting of objects called “atoms.” One advantage of this track structure is that it enables in-place editing of the files without requiring rewriting after editing. The various video codecs that can be encapsulated in QuickTime include MPEG-4 Part 2, H.264 (MPEG-4 Part-10/AVC), DivX, 3ivx, H.263 and FLV1 (Sorenson H.263). The MPEG-4 Part 14 (MP4) multimedia container format is essentially an extension of the ISO base media file format (MPEG-4 Part 12), which was based on the QuickTime file format.

Adobe Flash (FLV), which has become popular on the Internet, most often contains video encoded using Sorenson Spark, On2 Technologies' VP6 compression, or H.264. (On2 claims that VP6 offers better image quality and faster decoding performance than other codecs, including Windows Media Video, Real Media, H.264 and QuickTime MPEG-4.) A different version of the Flash container format is F4V, which is based on MPEG-4 Part 12, and supports H.264.

RealMedia (RM) carries the proprietary RealVideo and RealAudio streams. RealVideo was originally based on H.263, but is now a proprietary video codec. A RealMedia file consists of a series of chunks, each of which carries information on data type, size, version, and of course, the video and audio payload. Content description and metadata can be carried as well.

The Material eXchange Format (MXF), defined by SMPTE-377M, is a file format that encapsulates video, audio, metadata and other bit streams (“essences”). MXF was initially targeted to production, as a middle-state format for content exchange and archiving; the container format is specifically called MXF_GC, for MXF Generic Container. MXF_GC is “fully streamable,” i.e., A/V content can be continuously decoded through mechanisms such as interleaving essence components with stream-based metadata. The benefits of MXF include interoperability, a high level of sophistication and the use of international standards.

Structurally, MXF is comprised of a file header, file body and file footer. The header contains file identification and metadata, the latter including timing and synchronization parameters. The file body incorporates the data essence, i.e., video, audio and other media. The footer closes out the MXF file.

MXF also shares a common object model with the Advanced Authoring Format (AAF), a sophisticated data model and software toolset that allows post-production devices to share essence data and metadata. MXF can therefore store completed works with metadata, allowing for a viable means of tape replacement. It can also package together a playlist of files and store the synchronization information, as well as store cuts-only EDLs and the material they affect. Another strength of MXF is that it can encapsulate audio and video of any compression format, making it truly universal.


With the continuing evolution of container formats and storage media, content archiving is increasingly susceptible to “data rot,” where an original version of material will eventually require migration to various (newer) storage means. As if efficient workflow wasn't enough of a challenge, short- and long-term planning (and budgets) must consider the growing issue of content permanence.

Aldo Cugnini is a consultant in the digital television industry.

Send questions and comments to: