Modern archives

The repository moves to the front of the workflow.
Publish date:
Social count:

I can remember walking through the MGM film archives shortly after Turner acquired the library. The archive is in a warehouse not far from the La Brea Tar Pits in Los Angeles. It consists of row upon row of shelves with just about every film and tape format you can imagine (except nitrate, which had been removed for safety reasons). The different videotape formats read like an artifact listing from a museum: 2in quad, 1in Type B and Type C, 3/4in, U-matic, various 1/2in formats, and so on.

As movie studios, broadcasters and post houses move from tape to digital facilities, the nature of their archives is changing. Modern archives house legacy film and videotape formats, but they are also repositories for new file-based content, much of which has never been copied onto tape. What is the meaning of an archive in a network-based facility? What is the role of an archive?

The traditional archive is an end-of-pipe process, meaning that it is a repository for finished content. The function of an archive has changed as new technologies have brought new possibilities.

In some facilities, the archive is the hub of a larger workflow. In others, it serves as the input source for subsequent processing. Both of these configurations depart dramatically from the traditional view of the archive as a repository.

Shared storage

Positioning the archive as the central point in the workflow transforms it to shared storage, where users can gain access to content concurrently. ( See Figure 1.) While one person is editing content, another may be dubbing some of the same content to DVD. Some of these shared storage systems have long-term storage associated with them. In this configuration, these devices become an important archive for the organization.

Shared storage systems can turn into archives. We are rapidly approaching the point where disk storage is so inexpensive that it is not worth the time and effort to delete completed projects. As this shift takes place, the shared storage system naturally becomes an archive. Managing this stored content effectively is another issue.

Triage systems

Placing the archive at the start of the workflow allows incoming news feeds, for example, to go directly into the archive. Once the content hits the archive, one editor may begin working on a rough cut for a breaking news story. In parallel, another editor may be working on a longer version of the story for the evening news. Perhaps a little later, a third editor may access the same content for use in a magazine show that will air over the weekend. Again, content is available to several users at the same time, allowing innovative changes in workflow. An archivist looks through the content, determining what content should be retained and what should be discarded. In this way, the archive functions as a triage system. (See Figure 1.) It enables ubiquitous access to content while simultaneously allowing the organization to determine what to keep for the long-term.

All of this is possible with conventional analog video or SDI. But these workflows are really optimized for network-based technology because the centralized archive is closely related to the computer client server model. In this model, multiple clients connect to a central server to access desired content on the server. Content is automatically routed to and from the central archive using the self-routing nature of packetized computer networks.

Deep archive

In Figure 1, the archive is shown as a single storage device. But this is because Figure 1 is a 2-D drawing. When you look at the archive in 3-D, you can see that it consists of several different layers. (See Figure 2.) These layers are divided according to storage cost and access time. There is almost always an inverse relationship between these two components.

RAM storage is the top layer. Clearly, if video is cached in RAM, access times are extremely short. But while memory costs have dropped precipitously over the last few years, it is still expensive. In systems that require fast access times and that have a small number of popular items, RAM cache can be the best way to achieve high system performance. RAM is always the first component in any archive system, even if it is not explicitly called out.

The second layer of the archive is hard disk. Several years ago, some industry pundits said that disk technology had reached the end of the road. Despite these predictions, performance continues to increase, and prices continue to fall. It is not uncommon to find disks spinning at more than 10,000rpm, which was considered a high speed several years ago. This high rotational rate allows the head to pass over the desired data much more quickly, decreasing access times. It also allows the data to come off the head much faster, allowing video to be read and written in real time.

The third layer in these systems usually consists of tape. There was a time when the difference in price between disk and tape was about 100:1. Disk was so much more expensive than tape that systems were designed with a minimum of disk and a large amount of tape. But that has changed, and disk now occupies a much larger portion of the overall archive. Generally, the archive requires removable media at some point, and that is almost always tape. Large systems frequently employ robots to transfer tapes to and from drives.

The fourth layer is what most people would think of as a traditional archive — removable media on shelves. The media is typically film, videotape or data tape. At this layer, cost savings is the primary concern. The offline archive should be stable, and it should have good storage density at a low cost.

Archiving in the networked environment

Networked archives require specific design considerations. The archive is accessed through a dedicated file server over high-speed network connections. This server is optimized to serve large files. File systems, buss structures and other internal components are all designed to maximize throughput. In the past, little processing would occur in the server. These days, it is not uncommon for the server to perform file conversion on the fly, flipping the content into the appropriate format for the destination device.

With the advent of AAF and MXF, there is the potential to greatly increase the intelligence of the content stored in these archives at all levels. The metadata associated with an AAF or MXF file can stay with the file throughout the archive process. Should the database for the archive be lost, the database can be at least partially recovered using the metadata stored with the content. Also, as AAF and MXF content arrives at one of these systems, the metadata can be read from the file and used to populate the database so that preliminary data is available on every asset in the system.

The migration issue

One issue with any archive is how to migrate the content from one media type to another. The Hollywood crowd likes to point out, and rightfully so, that one answer is to eliminate obsolete media. An excellent archive format is 35mm film. However, analog videotape has presented serious challenges. Generation loss renders some material useless after only two or three migrations. The situation is better with data.

Digital storage, particularly computer data storage, allows the information on the media to be separated from the media itself. After all, when it comes to content, saving the media is not important.

Some time ago, a member of the Society of Motion Picture and Television Engineers, Dr. Juergen Heitmann, proposed systems that would automatically migrate content in archives from old media to new media. Dr. Heitmann recognized the disconnect between media and the content stored on the media. He proposed a data tape robot system that could accommodate different-sized tapes and different tape decks. The archive would copy content from old tapes to new ones in background when the system was not busy retrieving content for users.

Archiving in the networked world has the power to change workflows, increase access to content and allow broadcasters to get the most out of their content.

Brad Gilmer is president of Gilmer & Associates, executive director of the Video Services Forum and executive director of the AAF Association.

Send questions and comments