Audio-visual archives

The demise of videotape presents a challenge for broadcasters. It is no longer possible to send program videotape to the archive — a climate controlled warehouse — and expect to check it out sometime in the following 25 years. Instead, programs now exist as data files, and new systems and working practices are required.

Virtually all new cameras shoot as files written to data cards or optical discs, and videotape as an acquisition format is becoming a memory. More finished content is being delivered to master control as MXF files or another similar format, so again, videotape has had its day.

There are two issues with tape: How long will the medium last, and how long will tape decks be around? Tape generally lasts 20 to 30 years in a climate-controlled environment. However, finding a working tape deck older than 20 years is nearly impossible. There are still working quad machines, but they are relatively easy to repair, with discrete electronics and large mechanical parts. The miniaturized parts of modern decks are difficult to manufacture without special tools, so sourcing parts decades ahead will be difficult. The same applies to the highly integrated electronics.

In the data storage world, this problem is solved through ongoing migration to the latest format. Data on SCSI drives is copied to SAS, LTO-2 to LTO-5, and so on. Migration is carried out before the media wears out or the drives become obsolescent. Some tape libraries include lifecycle management software that monitors tape condition and migrates when appropriate. Archive management software similarly manages tapes, migrating when needed, and “defragging” partially erased tapes, compacting them to fresh media. (See Figure 1.)

Videotape can be similarly migrated, but each dub is a generation loss. For compressed digital videotape formats, losses from error concealment and correction can occur. For composite recordings, there are additional artifacts of decoding.

Data tapes and disks can be copied with very low data losses. An archive system should include a data integrity check to ensure the file hasn't been tampered with or corrupted while in storage.

LTO

LTO is a popular format for archiving because it has media and drives available from several manufacturers. Also, the format has a well-defined roadmap, with the current generation being LTO-5. Each generation drive is backwards-compatible two or three generations, providing for easier transition.

LTO-5 also introduced support for Linear Tape File System (LTFS). This is useful for small archives where single tape may be used to store projects, jobs or program series. It allows direct access to tape from Linux, Mac and Windows operating systems without the need for tape management software.

With a large robotic library, a DAM system and archive manager is used to catalog and index data tapes, but LTFS offers a low-cost alternative for ad-hoc archiving.

LTO-5 cartridges store 1.5TB data. The projected LTO-8, three generations ahead, is planned to store 12.8TB. Since LTO-3, support was added for write-once read many (WORM), and hardware encryption was added in generation 4. These are both of potential application in media archives.

One day, there will be a successor to the LTO format, and, in time, the archived files will be migrated to an as of yet unknown storage medium.

Cost vs. performance

Data can be stored on spinning disks or data tape, and each has its advocates. But, will one eventually overtake the other? While that answer is unclear, the choice will be largely determined by cost and features like integrity, resilience and restore times.

The choice of local server systems, or “the cloud,” is partly a business issue, balancing CAPEX and OPEX. However, security and disaster recovery are important issues to consider. Cloud-based storage services are backed by strong marketing. But, with broadcasters' experience at maintaining program archives, there is not necessarily a great need to turn to a service provider. Also, many IT solutions are designed for a data life of less than a decade, making them unattractive for a lot of broadcast content.

A secure archive will house a minimum of three copies. These would be a local copy and two different cloud suppliers, which would give very good resilience against disaster or business failure of a cloud service provider.

DAM

Resiliency is important, but how do you find the content? Any videotape and film library will have a catalog. Originally, it could have been a card index, but now more likely a database. In a similar fashion, digital assets need a catalog, index, and a means to search and retrieve content.

But, how do you protect the catalog? Again, the answer lies in data migration. This is where standards are important for asset records. It makes it much easier for future generations to use the information if it conforms to a standard. There is much work ongoing to standardize file wrappers and metadata for content archives, but the work continues, and, in 2011, there is no complete answer as to which standards are best.

The issues to be considered include: the choice of video and audio codecs, wrappers and containers, and metadata. Much of this has already been covered by SMPTE standards and AMWA specifications, but there are particular issues for archives. How do you know what you are viewing is what was originally stored? Has it been edited or tampered with over time? File integrity checks are one way to assure that you are viewing the original content. Some storage systems support WORM, and this is a good way to prevent tampering.

Heavily compressed video data is not ideal for an archive, as it is more subject to picture impairments than uncompressed or mezzanine codecs. Long GOP compression is also less suitable for further editing. The choice of codecs should balance storage cost against the compression ratio that suits future repurposing. Too much compression could damage the future value of an asset.

For many reasons, the content file and DAM record can become separated. During mergers and acquisitions of media companies, data can be lost. Also, databases corrupt for many reasons over decades. For this reason, it is essential that content files be self-describing. That means sufficient metadata is wrapped with the files to ensure the audio-visual data can be decoded, and that descriptive metadata provides explanation of the content and its ownership.

Summary

Setting up a program archive is a complex balance of cost versus performance. An archive must be more resilient against failure than general IT systems. The reason is because lost files are a business' lost assets.

Judging an asset's value 20 or 50 years in the future requires a crystal ball, which makes it even more difficult to judge the ROI of an archive system. Additionally, some media companies are attempting to develop a best practice for media preservation; so, for now, learn from the guys with valuable assets as they will have given the task considerable time and effort.

David Austerberry is the editor of Broadcast Engineering's world edition.