Archiving: Tape rises from the dead – Part 1

Broadcasters and other content producers are increasingly facing huge amounts of data that must be stored, edited and played out. Now repeat that process. The result is that new ways are being developed to retain programming, both finished and raw, that ...

Broadcasters and other content producers are increasingly facing huge amounts of data that must be stored, edited and played out. Now repeat that process. The result is that new ways are being developed to retain programming, both finished and raw, that permit online-like operation, but without the huge cost of vast arrays of spinning disks.

In this series of articles, we’ll look at the resurrection of that old and reliable storage medium —tape — and how it can reduce storage costs, protect your business and ensure a long life for your content.

The value of data

Let’s first examine at the value of data. Research, and common sense, shows that the amount of data being stored is growing rapidly. Let’s consider just how much “stuff” is being stored.

From a research white paper, by Floyd Christofferson, Product Marketing Director, SGI:

All of the printed material on earth is estimated to be about 200 petabytes (200,000 terabytes) of data. That is a lot of information and a lot of data to manage. But it is dwarfed by the amount of data that is generated worldwide with information technology today.

According to IDC estimates, in 1999 the total worldwide volume of data was estimated at 2 exabytes, or 2 million terabytes. By 2007, this number had grown over 100 times to a total of 281 exabytes. But in the last three years, data growth alone was 849 exabytes, and expected is to mushroom to a total of 2855 exabytes by 2012.

To refresh your memory, one Exabyte (EB) is a 1 with 18 zeros following it. Or, just remember that an EB is the same as a billion gigabytes. By any measure, the world is awash in data, and broadcasters and content producers are contributing at least their fair share.

For media producers, all this digital content brings with it at least two problems: storing it today for production purposes and then storing it forever — just because you may someday need it.

Solutions to the first issue are trivial. There are a myriad of tape, disk, solid-state and optical storage platforms, many targeted at production applications. They are highly reliable, fast and inexpensive. What often happens is that producers and engineers try to use those same platforms to store content for long periods. Not only does the TCO increase, even logarithmically, but finding what you think you have becomes almost impossible.

Perhaps you’re not a hoarder, like the people seen on some of those TV shows. But have you ever known a news photographer, producer or anyone involved in the generation of content that didn’t want to store every second of content she ever touched? While storage is cheap, the costs to store everything isn’t.

Personally, I’ve adopted the practice of when the spinning disk begins to fill, I just buy another one — to use either as an additional drive, or to replace the same drive with twice the storage. Here’s how that totally smart philosophy has served me.

At last count, I’ve suffered through four HDD crashes, losing almost 1TB of data, including hundreds of photos, emails … well, you get the picture. Fully 30 percent of the drives ever connected to my systems have at some point failed, resulting in data loss.

What I have finally learned is that just plugging in more or larger drives is not a reliable backup policy for me. And as for using home office archiving software, don’t even get me started. If you even mention the word Arcronis, my head will explode.

Old data is seldom used again

One important aspect of storing old data is that it is seldom used again. Forrester Research said that 85 percent of production data is inactive and that almost 70 percent hasn’t been accessed in the last 90 days. Consider the cost of storing that clip of the train wreck on spinning disks for one year or two years. Now try 10 or 20 years.

Figure 1 illustrates the dichotomy between the probability of data reuse and the amount of data stored over time. In the media world, we often refer to this as content’s long tail. This means that content initially may be highly valuable, but as time goes on, its value typically drops dramatically. However, it never drops to zero and when integrated over any delta time frame, the content’s value could be higher than it was when first produced. If so, then the content’s owners will want to both store the content and have some method for recovering that content when new opportunities develop.

Broadcasters often express the desire to be able to access content on an immediate basis. “Yes, but if in two years the mayor gets hit by a bus, we need the ‘tape’ of his initial election for an obit news story.” That’s the standard line from a news director. Ask the director of engineering the same question, and he may lament, “We couldn’t find it if we had it.”

The bottom line for storage is that you need it and probably a lot of it. However, it must be managed in an efficient and cost-effective manner. The value of data never goes to zero, but may decrease over time. The long-tail of media offers the opportunity for long-lived revenue. Balancing these sometimes conflicting performance aspects is the task of the data manager.

Archive versus backup

Many media professionals do not understand the difference between an archive and a backup. Let’s explore these two quite different tasks.

Both solutions provide storage, but the goals, and therefore best platforms, are different. It is important to recognize that one is not a replacement for the other.

Think of backup as an insurance policy. If your station or production suite is blown away by a hurricane or tornado, you need to have already secured your data (content) in another location. That location can be across town, in another state or around the world. There are important issues in where that data is located, but for now, suffice it to say, that a good backup solution simply stores whatever content management desires to be able to quickly access to remain in business. The content would be a day’s, week’s or even a month’s worth of programming. A backup solution is another form of business insurance. A backup system is designed to help the company recover from catastrophic events, but may supply content sufficient for only a short period of time.

An archive may be thought of, for those of you old enough to remember tape rooms, as the place where recorded programs, raw content and edited footage go to die. The archive system was often little more than scribbles on a label affixed to the tape box or reel. If you wanted that four-year old piece of content showing the mayor’s inaugural speech … good luck!

There is another important aspect of ease-of-storage that digital technology produced. Syndication rules require broadcasters to delete programs when retention rights expire. This means that the 13-month old copy of “Oprah” that didn’t get deleted because the tape gal wanted to make a personal copy should not exist. Never mind that copying it wasn’t legal in the first place, but now you are storing a copy of a show to which you have no rights to even have. That knock on the door is the auditor.

Archive systems provide sophisticated middleware to manage such issues. It’s no longer just a case of minimizing storage requirements; it can be a case of keeping your butt out of court.

Finding solutions

With this as a general background, the next few blog posts will look more closely at archiving, backup and new ways to implement them in a cost-effective and efficient manner and with a growth going forward path. We’ll look at some new software and hardware solutions being offered that make the processes easier. Finally, we will examine some technologies and offer some outside vendor-provided tutorials that may help you choose wisely.

Additional resources:

Time value of data, Floyd Christofferson, SGI

Designing an active archive, Qstar Technologies