Broadcast archives need new technology

A warehouse of shelving no longer meets the needs of file-based operations.
Author:
Updated:
Original:
Archiving was once a real estate issue, as with these Digital Betacam tapes at itfc’s archive in London. With files, it became a technology issue.

Until recently, the provision of a content archive was given little thought by engineers and management. Any videotape that was likely to be needed in the future was sent off in a van to a warehouse, to be stored until it was needed. The warehouse required climate control, security and maybe a small dubbing facility to run off copies of tapes when recalled.

The software to manage the archive was little different from that in a public library: bar code readers, a database of tape IDs and shelf positions, and a check-in/check-out facility.

The broadcast archive is changing, and the driver is the migration away from videotape. Now that videotape is reaching end of life as a storage technology, the archive becomes a repository for files, rather than the physical asset that a videotape represents. A file no longer has the intimate association with the recording medium as with a master videotape.

Storing files raises a whole number of new issues. It now takes serious technology planning if the archive is to last as long as tape (typically 25 to 30 years) or even longer. Data storage typically has a short life, less than five years for a disk drive.

By the time videotape starts to deteriorate, whoever sent it to the archive may well have retired; it will be someone else's problem. Files must be constantly migrated to the latest storage technology, so a technology solution to preserve the archive must be put in place today, not in 25 years time.

Other issues to consider include network bandwidth from production to the archive, storage architecture and disaster recovery planning. And there is the cost of the ongoing operation of the archive.

IT archives

I was watching a documentary about the “Domesday Book,” a record of title in England during the 11th century. One-thousand years later we can still read the books, written in ink on sheepskin parchment. There are much older written documents, such as Egyptian tablets or the Dead Sea Scrolls. But how much of our recent video material will last that long?

Many of the IT developments in data archiving are designed to maintain documents for regulatory compliance, which in most cases is less than 10 years. Broadcasters regularly air series from 40 years ago, or even older, so commodity IT systems are not necessarily going to meet the needs of long life. Broadcasters' requirements are more akin to the collections sector: museums, archives and libraries.

Programs can have value for reruns, and some have a more intangible value as historical records. The value of a news archive is the depth it can add to stories, again intangible as an asset. The balance of the value as an asset versus the cost of maintaining an archive is perhaps the biggest problem in designing an archive, in proving the ROI.

What is an archive?

An archive can be many things. For a production company, it could be somewhere to store the files from the temporary storage used for acquisition now that many camcorders use SSDs and flash storage. For a news department, the archive represents news history and will need to be maintained for decades. A sports department will have similar requirements. For a transmission department, the archive is a low-cost repository for programs for the duration of the rights window. For a program commissioner, the archive stores the program as an asset well into the future, as long as it can generate revenues.

All these archives have different requirements, both technical and cost. It is a matter of judgement as to the value of an asset, as is the long-term cost of running an archive for decades. Deciding what to keep and what to delete is not an exact science.

Which codec?

This is not a simple choice for HD and higher resolutions, although for SD, MPEG-2, I-frame at 50Mb/s is popular.

The decision rests on several factors. First, has the file been ingested from videotape, or is it an original file? Is 8-bit resolution sufficient or does it need to be stored at 10 or more bits of resolution? Should it be encoded in a standard such as MPEG (2 or 4) or JPEG 2000, or should it use something more edit-friendly such as Avid DNxHD (SMPTE VC-3) or Apple ProRes? If you need more than 8 bits of resolution, that narrows the choice, excluding MPEG-2 for example. For many codecs, there is a choice of long GOP or I-frame only.

AVC-I is emerging as a choice for HD; it supports 10-bit coding and can be edited without the need to decode GOPs.

The choice of codec is a compromise. More efficient codecs use less storage space for a given quality but may need more powerful workstations for editing. As computer performance increases, the latter becomes less of an issue. The decision made today may not be the optimum choice next year.

A major goal is to minimize the number of transcodes from acquisition to delivery, which best maintains video quality. That may dictate a proprietary format for the archive. For most broadcast applications, audio can be left uncompressed.

Another factor is the video and audio wrapper or container. MXF is the obvious choice here, but which operational pattern? OP1a makes sense for most applications, but AAF may be best if an entire project is to be stored with the intention of later extensive repurposing. Many production systems maintain metadata in a separate database. For a long-term archive, self-describing files are an advantage.

The optimum codec will depend on the application — news, recent drama, videotape transfers, and HD or SD — so most broadcasters' archives will contain a mix of codecs.

Technology

Current technology is still based on electromagnetism, whether disk or tape. The magnetic information has a long life, although the supporting systems suffer from obsolescence. The formats of tape drives and disk controllers have a short life of only four or five years, so long-term archiving needs ongoing migration of data to the latest storage formats.

Optical systems have failed to provide the storage density of tape, although they, like solid-state storage, have applications in production. However, in 10 or 20 years time, a new technology could supplant or replace magnetic storage.

There is one technology that is low-risk but expensive: 35mm film separation negatives. The images are stored as red, green and blue silver negatives. These do not fade like the dyes used in color negatives, and they can potentially last centuries in the correct storage conditions. Also, they do not rely on computer technology, which is notorious for its rapid obsolescence. Who can even start to envisage what the computer of 20 years time will be like, let alone in a century from now?

Continue on next page

Disk or tape

This is a long-running argument, with proponents on either side. Many broadcasters have adopted a mix as the most cost-effective system for their needs. Clouding the issue is outsourced storage — the cloud. The economics are changing quickly, but cloud storage does require a high-bandwidth connection for large media file transfers.

Many cloud systems sell the advantages of storage on demand and ramp up and down to meet your needs, but broadcasters generally have storage needs which go one way — up. Surplus capacity is not an issue.

Just as confidential records are stored inside mountains, outsourcing storage is becoming more popular. However, many broadcasters have concerns about security, retrieval times and catastrophic failure, which has happened. Outsourcing to a data center has advantages: They are experts at running storage systems. But if the goal is to archive programs for decades, you may want to consider whether the outsourcing company will be around in 25 or 50 years; IT companies come and go apace. Another route is to keep the storage in-house but outsource the management.

LTO advances

Data tape is the chosen archive format for many broadcasters. The latest generation of LTO, LTO-5, introduces a feature that lends it to broadcast archive applications: support for the Long Term File System (LTFS). This is an IBM development for a low-cost, self-describing tape file system.

An LTFS volume comprises two partitions, one for data and one for an index. The index is an XML data structure that describes all data files, directory information and associated metadata for files recorded on the LTFS volume. This means that a tape can be mounted and read just like a disk drive without the need to first transfer the data to online disk storage.

LTFS, when added to other LTO features such as WORM and encryption, is improving its suitability for archiving video every time there is a new release.

WORM was introduced in LTO-3 for regulatory compliance applications, but it also has advantages for video libraries. A write-once tape cannot be overwritten or altered. Content integrity is becoming more of a requirement for file-based operations. Was the file you are reading the same as the file written years ago? The combination of WORM with a content integrity check as described in SMPTE 429 will provide the confirmation that a file has not be tampered with or inadvertently altered.

LTO-5 tapes can store 1.5TB of data, which is 1.5 hours of uncompressed 1080i/50 or more than 50 hours of 50Mb/s content. The future road map takes densities up to 12.8TB for LTO-8, which is planned for release in 2016.

Threats to an archive

The big problem with data archives is that failure can cause the loss of assets. Content originated on videotape can always be re-ingested, albeit at a cost, but file-based content will require precautions to protect the assets.

Being mechanical, disks and data tape can suffer catastrophic failure. There are techniques to mitigate this, including RAID for disks and duplicate copies for tape.

Disasters can cause great damage to a data center, and the only way around this is either to mirror to a geographically remote data center or to store duplicate data tapes in a remote library. As with all archive design, choices are compromises between availability and the total cost of ownership.

Human error is an inevitable factor in archiving. This could be an archivist accidentally erasing a file or an IT system engineer performing incorrect procedures. Again, file repositories present a higher risk than videotape; the loss of a single tape does not compromise the whole library.

Data arrays are subject to the usual IT threats such as computer viruses and software corruption. There are well-known procedures for guarding against these threats. One of the advantages of outsourcing the archive is that storage specialists are well-versed in these procedures.

Summary

Now that many productions are recorded directly to files rather than videotapes, consideration must be given to the long-term storage of the assets. Videotapes were simply dispatched to a warehouse. Occasionally a tape might go missing or deteriorate beyond the point where it could be played, but bar a catastrophe at the warehouse, the risk of losing everything was remote.

A central file archive is subject to far more risk and can fall prey to technological obsolescence long before videotape. If broadcasters expect to realize the value of their program assets in 25 or 50 years time, serious thought must be given to the design and operation of the file archive.

As ever, there are choices and compromises. Cloud or in-house? Which codec and which wrapper? Finally, which storage medium, disk or tape or some new storage technique not yet created? Without a crystal ball, only your best judgment today will ensure future generations can view the best of today's television. Files do have the great advantage that, abstracted from the storage medium, they can be migrated without the generation losses of videotape. Will your library last as long as the “Domesday Book?”