Technology in Transition: Data archive systems

In the second installment of four related to automation and asset management, this month we consider data archive systems.
Publish date:
Social count:

Last month, this column focused on automation software. In the second installment of four related to automation and asset management, this month we consider data archive systems.

Prior to the time our industry began to store media in computers we often call video servers, the only archiving done involved storing a second copy of a master on a shelf or perhaps a backup of the station traffic computer on removable media. Today there is increasing interest in tapeless facilities, and ironically, we consider backing up the video stored in a server on another kind of tape.

The motivation to use archive systems comes from two issues. Server storage cannot be infinitely expanded without infinite money, power and space. Also, the risk of having only one copy in a server leads to the natural impulse to make a backup copy. Hard disks are mechanical devices and eventually fail, potentially leaving priceless media exposed. Archive systems can provide salve to both.

Some of the first robotic data archive systems designed were versions of the digital video robotic libraries manufactured by Sony and Ampex. At the time, the only mass libraries available for video were those distinctly linear and video centric machines. Ironically, though the media they used was linear for video, when backing up files from computers, it recorded files that could be accessed in a nonlinear fashion.

As robotic libraries reached a practical limit of utility, video servers were growing in capability. For a while, it looked as though the limits of robotic library utility would be bridged through servers used as output caches. Now the opposite seems to be the case. That is, robotic libraries are acting as a “storage cache” of sorts.

Modern archives are quite different than those relatively primitive machines that repurposed technology. The volumetric storage density is higher, and instead of linear access, the media is formatted to allow random access to individual files. In addition to tape-based media, we now have DVD-ROM being used as a record medium. The options for physical media each come with their own advantages and disadvantages, but first and foremost, one must consider the application and pick a technology appropriate to the task.

When making plans for an archive, it is important to consider the purpose first. The intent may be to minimize the size of a server by using the archive as a “near line” extension of the server's storage. Or it may be more important to have backup copies of content. If the media will be stored outside of the robot locally, you should consider if the storage conditions need to be kept near the ideal temperature and humidity to sustain long-term stability of the recordings. Certainly, it is critical to ensure that any backups are on a media that will be supported long into the future. Media should be cycled regularly and the bit error rate monitored. When the rate begins to climb, the archive should be capable of cloning the content to ensure it will be available when needed.

Though the first thing one might think of is the physical medium and the hardware to record, playback and handle the media, it is equally critical to look hard at software needed to make the process seamless. Does the automation system you have support connection to an archive? Indeed, does the brand of server you have or are considering support archive connections? Most major suppliers of automation and servers do, but not all. Also, vendors do not support all types of archives. If your interest is in a DVD archive, be sure to ask all concerned if they support that option.

The archive is normally controlled by a software package that handles the transactions between the server and automation software. When a request for content is made to the server by automation and the content is not available, it must be found and moved from the archive to the server. The “archive management software” performs that task. It also keeps a database of the content and its location, which might be a physical bin in the robot or a shelf location. The archive management software is usually licensed annually, and the cost of the license in some cases depends not on the size of the robot controlled, but rather on the number of Gbytes/s of content under management. As the archive grows, so does the fee, even if it is tapes on a shelf. It is clear that under those circumstances, you should purge unneeded material from both the archive and archive manager as soon as you can to avoid unnecessary costs.

Other software suppliers charge by the size of the robot, and the amount of media on the shelf does not matter. You could theoretically increase the archive forever, never releasing media that may some day be useful, however unlikely that might be, without changing annual support costs. In either case, the cost of archive management software is a considerable portion of the cost of the hardware. The media is not inexpensive either, but is a 1-time cost instead of the recurring cost of the license. Unfortunately, you cannot opt out of the license after a few years as the fee is payable as long as the software is in use.

Robot size and speed are critical criteria. The size may be measured in centimeters but usually is measured in the number of bins. Most manufacturers offer varying sizes of cabinets, with anywhere from a couple of dozen to literally thousands of tapes available under robot control. The number of transports also is scalable.

Planning for the archive must also include questions of transfer rate. A request for a large file may take several times the length of the media to transfer through the gateway. A single tape might hold from 20 Gbytes/s to 100 Gbytes/s. A single half-hour program at the ATSC coding rate consumes just short of 70 Gbytes/s without any overhead added. If the effective transfer rate is 10 Mbytes/s per second, you will tie up the archive for two hours to transfer a single piece of media. Many archives are considerably slower than this, particularly DVD. While DVD playback speeds may be sufficient for many archive purposes, DVD writing speed is considerably slower and not suitable for some applications.

Access time is similarly important. You must carefully think through the bottlenecks in the system to ensure the combined effect of transfer rate and access time does not collide with the volume of expected transfers.

One way to mitigate some of the transfer issues is to connect the archive to a library server. When a media request is made, the archive transfers the media to the library server. The media is transferred to the air server at much higher speeds. This leaves the load on the air and backup servers lower, as the transactions happen during shorter time periods, though the technical complexity is somewhat higher.

Next month, we will consider newsroom automation systems. In May, we will conclude this series with media asset management issues.

John Luff is vice president of business development for AZCAR.