DESIGNING an active archive system

Content, regardless of age, must be accessible when it's needed and retrieved in a timely manner — seconds or minutes as opposed to days or weeks — to support a competitive and growing model, and an active archive allows just that. An active archive can be defined as tiered storage that provides access to data across a virtualized file system capable of migrating data between multiple storage systems and media types using solid-state drive, hard disk drives, magnetic tape and even optical disk.

An active archive should be viewed as a content management framework. With the right software tools, an active archive provides a framework for moving and managing content, a framework that is heterogeneous by nature and allows for data management flexibility, and that provides a global namespace where users can share and access content transparently. At the core of that active archive framework is the ability to back up and protect content, offer long-term preservation and ultimately lower the cost per GB of managed storage. (See Figure 1.)

The biggest driver to implement an active archive stems from exponential storage growth. It's not realistic to continue taking a linear approach to an exponential problem. Assuming a 50 percent annual growth rate of storage requirements, which is a conservative estimate to many, means that 100TB of storage needs today will grow to 760TB just five years from now. Engineers can no longer think of storage in the same terms as before.

Before you start designing your own active archive solution, however, there are three key points to consider first. Failure to fully understand the implications of any one of these criteria could prove costly to your organization.

Evaluating the medium

The most prevalent two storage mediums today are tape and disk. When looked at side by side, tape is incredibly more reliable than disk, creating far fewer bit errors. If you hear that tape is unreliable, often times people are referring to tape of eight to 10 years ago. Whenever you compare a technology of today to a countering technology of a decade ago, of course there are going to be differences.

In a study of bit errors recorded on various storage mediums, it was found that the hard error rate was 1 bit in 1 × 10E14 bits for a desktop SATA drive and 1 × 10E15 bits for an enterprise SATA drive. When looking at an enterprise fibre channel or SAS drive, that jumps an order of magnitude to 1 bit in 1 × 10E16 bits.

When tape is compared to those disk statistics, tape is far more reliable, with an error rate of 1 bit in 1 × 10E17 bits for LTO-5 and 1 × 10E19 bits for IBM's TS1140 media. These numbers mean that you're likely to see a hard error on average every 11.3TB for a SATA disk drive and every 113.6TB for a FC drive. Disk bit errors are much more frequent than tape, with hard errors occurring on LTO-5 on average every 1.1PB, or every 10.6EB for TS1140. Fortunately, disk vendors have come up with ways to minimize the impact of these errors through RAID, striping and other methods so that users can continue to rely on tier one disk for the storage of frequently accessed data and feel confident in moving the less frequently needed data down to lower tiers in an active archive.

What's more, failures on disk subsystems are typically event-driven. A failed component is often the result of a power outage, a temperature spike or electrical spike. If one component fails, the likelihood of additional failures increases. During a RAID rebuild, the rest of the system has to work that much harder, further increasing the likelihood of additional failures. Keep in mind that most systems today use up to 3TB disk drives with 4TB drives right around the corner. What are rebuild times going to look like then? How exposed is your content going to be?

Evaluating the cost

The faster your storage requirements grow, the closer you need to look at containing cost. When considering the TCO a data storage solution, there are several factors to consider — equipment costs, ongoing support, media costs, floor space required and power consumption.

The Clipper Group published a study in December 2010 that analyzed all of these factors. The study looked at storing 1PB of data over a 12-year period assuming a 45 percent annual growth rate on an all-disk solution versus an all-tape solution. In the end, the tape solution was found to be more than 15 times less expensive than disk — $67 million for disk versus $4 million for tape. In fact, the power costs alone for the disk-based system totaled more than the 12-year total cost of the tape-based system. Now, an active archive is neither completely tape-based nor completely disk. By using the two technologies together in the most appropriate configuration, an organization can realize major cost savings.

In addition, when one factors in tape's future, the cost story becomes more compelling. While the cost per GB of disk storage decreases at a slow pace, the cost per GB of tape has fallen more steeply. About five years ago, the cost per GB for tape was about 25 cents for LTO-2, while today the cost is under 4 cents per GB for LTO-5. Now consider that two years ago, IBM and Fuji developed a prototype tape where they “recorded data … at a density of 29.5 billion bits per square inch — about 39 times the areal data density of today's most popular industry-standard magnetic tape product … these new technologies are estimated to enable cartridge capacities that could hold up to 35 trillion bytes (terabytes) of uncompressed data.” (Visit “IBM Research sets new record in magnetic tape data density” at http://www.zurich.ibm.com/news/10/storage.html.) At 35TB per cartridge, the future cost per GB looks promising.

Bottom line: Multiple offline copies on data tape are necessary. With the cost of tape media below $60 for 1.5TB, it's easy and inexpensive to make multiple copies, and most archive applications today have the ability to create multiple instances on multiple pieces of media.

Evaluating long-term content preservation

Lastly, any archive solution must provide a framework for long-term content preservation. LTO has a long shelf life of 30+ years, and vendors have shown proven methods for migrating data from one generation to the next. Additionally, the LTO consortium developed Linear Tape File System (LTFS). Specifically, LTFS provides users with the ability to read and write data irrespective of an archiving application. Ten or 20 years from now, LTFS will offer the potential to retrieve content off a tape even if the archive application that wrote the needed file is no longer in use. Because of its interoperability, LTFS shows a lot of promise of staying power.

Not all archives are created equal

In the end, there are many factors to consider when designing an archiving solution for a media and entertainment environment; archives are not all created equal. A robust and efficient archive solution should not only factor in performance characteristics, but also carefully take into account reliability, cost and a future platform roadmap.

It is difficult to find two environments that are identical, so care must be taken so that the specific requirements of any given environment are met. Remember, an archive, specifically an active archive, is a content management framework that provides content protection and preservation, and a framework for lowering cost per GB of managed storage. Within an active archive framework, the archive management application extends the file system across multiple tiers of storage, including tape; allows policies to migrate content from one tier to the next; and allows users complete and uninterrupted access to any asset — all of which are transparent.

The media and entertainment industry is a prime example of an industry faced with exponential content growth, and one where an active archive framework can deliver a highly cost-effective solution.

Peter Halpern is Broadcast Sales Specialist, Spectra Logic.