Archival systems

My favorite online dictionary defines archive in three relevant ways:

“A place or collection containing records, documents or other materials of historical interest.
A long-term storage area, often on magnetic tape, for backup copies of files or for files that are no longer in active use.
A file containing one or more files in compressed format for more efficient storage and transfer.”

Figure 1. The four layers of archival storage. Click here to see an enlarged diagram.

The first definition is interesting because it is simply a place. The concept is applicable to the idea of layered storage in that one only cares to know the information has been stored and can be retrieved. The second definition might also be applicable to media content in the context of long-term, perhaps offline, storage. The third definition is not so much a place as it is a methodology for reducing the quantity of digital storage needed.

In this context, it is interesting to think about how various methods of archiving media content fit into these definitions. It is popular today to think of four layers of storage that are relevant to the archiving process. (See Figure 1.)

Online storage

The top layer requires immediate and real-time access. When a command is issued to play or deliver the media, it is immediately executed. Due to cost constraints, the size of this layer is the most restricted. The storage is high in bandwidth and is fault- tolerant. Nothing stands in the way of delivery. An example is broadcast hardware, with a real-time operating system running it, which is equipped with control interfaces and software to facilitate the demands placed on it.

Near-line storage

The second layer is near-line in the sense that nothing can play directly from it to the release chain. Between the two layers is a software interface that controls the movement of the content to the next layer in the hierarchy. That same software layer must exist between each of the layers. This higher capacity storage might come in the form of inexpensive disk arrays with good fault tolerance that may have high bandwidth, but are constructed in a way to minimize the cost of storage and maximize the ability to interface to other products.

In a broadcast plant, the transfer of content from suppliers might connect directly to this layer of the facility. This is where content conversion takes place and where ingest occurs from live or other sources. It is exclusively file-based and is not aware of the outside real-time world.

China Central TV’s online robotic archive operation consists of a StorageTek PowderHorn 9310 tape library with five T9840B and five T9940B tape drives.

Online robotic archive

The third layer is much cheaper storage and often is robotic. Examples include streaming tape, optical (DVD or volumetric storage) or other new technologies that are inexpensive and massive. It is slower than the other layers above it, and it is simply a repository for content. It makes no attempt to convert content to a release format for delivery.

Deep offline storage

The last layer might be thought of as the removable media from the robotic archive stored on shelves in a deep mine somewhere for long-term preservation purposes. It is the least vulnerable because it is not connected online to anything, and it is clearly the slowest because requests might involve transportation by snail-speed systems. It is protected and secure from any intrusion and is the least expensive layer to maintain.

Archival applications

All archive implementations don't need more than two of these layers. The concept, remember, is to collect records and store them. Each of the layers does so with different capabilities, access speed and cost. A station with no news might implement video servers with only a near-line archive for the sole purpose of increasing the storage capacity without increasing cost dramatically.

With the DaletPlus News Library, video and log archive information is instantly accessible online anywhere in a facility.

The cost of online storage, however, is dropping rapidly. Thus, the line between near-line and large online libraries is a bit blurred. If you think of the functionality differently, you might think of near-line as a functional part of an online system. Its content is likely to almost be available in real time. In some implementations, the near-line can transfer to the online system at speeds far above real time, so it is seamless to the operation.

The goal of archive management software is to extend that seamless capability to deeper layers. The requesting application at the top — broadcast automation, for instance — doesn't care that the archive manager had to move the content to make it available. It simply wants to know that the content is present when it is cued for playback. In this sense, the archive software is, in reality, the archive itself. The user should not have to care how deep in the stack the archive manager had to go to find the content and how it had to move it.

Suffice it to say, a request for content that has been moved to a limestone mine had better happen with enough warning to load the truck up and deliver it. But in all other cases, the time between request and playout can be thought of as simply a pre-fetch command, meaning an advanced warning to move the content through the stack to the playout level by a certain time.

Do not confuse archive management with media asset management software. An archive manager may not know anything about the content except its name (unique material identification or UMID, perhaps) and location. Media asset management software will likely have a much more rich database of information about the content. It is this rich relational database that makes searching and categorizing possible, something that the archive manager does not necessarily need to be able to do. Archive management software is strong on the certain knowledge of the location of the content and the best way to get it to the final destination.

With this view, the hardware seems to be irrelevant, which clearly is not true. Each manufacturer's system has unique capabilities. Robotic systems can have hundreds of terabytes or petabytes available in a system, which looks to the outside world like it is online except for the speed of access. Volumetric optical storage (holographic storage) can be significantly faster in access times because there is no need to rewind tape or search in a linear fashion through a serial medium. DVD can be inexpensive for modest installations, but it is somewhat limited in write speed.

Thus, each technology is applicable in some applications. News might fit better with DVD, massive program libraries with streaming tape and systems with frequent random access needs with volumetric optical storage.

Deciding how to implement a system requires a thoughtful view of the needs of the system, including the throughput needed, access time demands, cost of storage, maintenance and other factors. Avoid giving in to manufacturers who point out that their systems work in all applications. But get used to the concepts soon; they are integral parts of many facilities today and will be in many more soon.

John Luff is the senior vice president of business development for AZCAR.

Send questions and comments to:john_luff@primediabusiness.com

Recommended reading