Backing up critical data

When it comes to protecting data, broadcasters have it rough. Not only do they face all of the normal IT issues with protecting important desktop business data, but increasingly they are faced with protecting vital data having to do with on-air operations.

As broadcasters think about protection, it may be useful to divide the areas of concern into protecting on-air programming, meaning video, audio and closed captioning, and protecting the data having to do with the operating systems that get video to air, meaning automation and other systems.

Protecting programming

For many broadcasters, especially those just moving to server-based playout technology, it's important to decide how to protect the programs and commercials stored on the server. Most broadcasters keep tape as a backup for a while, and in some cases, videotape is kept as the backup method indefinitely. However, for many broadcasters, the move to server-based playout was driven by a requirement to do things that could not be done with a tape-only facility. Some of these drivers are a request from management to reduce staff, or to provide multiple feeds from a consolidated master control facility.

Whatever the motivation, broadcasters frequently find that after being on servers for a while, videotape is unsuitable for backup. Or they discover that videotape serves as a backup of last resort, meaning that the facility functionality will ultimately suffer if the operators must revert back to videotape. Finally, there is a cost to maintaining videotape as the backup of choice. This cost shows up in maintaining the videotape library, tape machines and sufficient tape-based playback facilities to actually be able to execute a tape-based backup plan. If retaining videotape as a backup is not feasible for many broadcasters, then what is a good way to protect this vital data once it moves into the server environment?

The obvious answer is a one-for-one backup plan; all content on one server is duplicated on a second server. This backup plan is simple and intuitive, and the only real issues are cost and ensuring that the two servers really contain identical content.

If you can't deploy redundant playback servers, all is not lost. When thinking about protecting data, it pays to decide what the most likely failure modes are and protect them. The two components most likely to fail in servers are the power supply and the disk drives. Buying a server with redundant power supplies is a good idea, especially if you do not have a backup server. When it comes to disk drives, professional quality broadcast servers employ a scheme that provides protection from disk drive failures, most typically through RAID technology. There are different levels of RAID, but generally, RAID 5 is used. In this configuration, data is striped across several disks. As the data is written, parity data is calculated and recorded as well. If a drive fails, the RAID array can recreate the missing data by using the data from the remaining disks plus the parity data. Figure 1 on page 24 illustrates RAID 5 along with two other common RAID configurations.

An important point when considering RAID is that if a drive fails, the technology is so good that the operator will never notice unless he is looking at error logs or is notified by monitoring software. If a second drive fails before the first one is replaced, the failure will be catastrophic: All data on the drives will be lost. So it is extremely important to take advantage of all notifications provided by the server manufacturer regarding disk failures.

It is also key to understand what it takes to replace a drive and to rebuild the data so that the drive is back online. Many servers will rebuild the data on the new drive in the background, but some will not. Also, note that it may take a long time for the process to complete. This means that the system is vulnerable during the rebuild period. Another question to ask is whether the system performance is restricted during the rebuild process. In other words, if you lose a drive and replace it with a new one, are there things you cannot do because the server is busy rebuilding the drive?

To further protect the data, some RAID systems allow you to purchase extra drives and designate them as hot standbys. This means that a drive is instantly available if another drive fails — well, not quite. Actually, the drive is instantly available, but it does not contain any data. The system must still rebuild the data on the new drive. But in this case, you will reduce the exposure to a second drive failure by beginning the rebuild process immediately rather than waiting until someone notices the error and physically replaces the failed drive. Understanding RAID and its limitations can be an excellent way to protect yourself from drive failures in video servers.

Protecting the most important data

As I said earlier, when broadcasters think about protecting data, they almost invariably think about protecting the content. But there is other data that may be even more important than protecting the content itself. Most facilities contain some form of automation. In some cases, an automation failure means that normal operations cannot continue, and the on-air look suffers.

Again, a one-for-one backup is a simple solution, but in this case, not only is cost an issue, but controlling one server from two automation systems presents nontrivial issues. Also because most automation systems operate from a centralized database, it's critical to ensure that this centralized database remains available and that its data is protected.

One way to protect a centralized database is to store the data on a RAID drive, just as you would your content. While this may be sufficient, consider what it would mean if you lost all of the automation data. How long would it take you to reingest all of the material? How long would it take you to enter all of the metadata required to get the content on the air? This database is so important and the ramifications of losing it can be so serious that a belt-and-suspenders approach is appropriate.

Many different database technologies exist. One technology is SQL. It supports something called database replication. Replication allows the SQL server to duplicate the database onto a database running on another server in near real time. In fact, it is possible to design server/database systems such that the loss of one database or server is transparent to the application accessing the database.

Although database systems attempt to protect you from data corruption, errors can occur. And, of course, few systems will protect from human error. For these reasons, it is important to maintain a rolling backup of the database system. How frequently you back up this database depends upon how quickly it changes, but I would recommend making backups at least once every 24 hours and keeping each backup for one week.

No matter how you decide to protect the automation database, invest the time to understand where the database resides and what protections are provided by your vendor. Finally, if you need to enable the backup database, switch over to the backup during a noncritical time to ensure that everything works properly, and be sure that the process you follow to change over is well documented.

Protecting critical apps

Lastly, consider how to protect your facility from the loss of a critical on-air application such as automation or ingest. If you dig into the automation system design, you may find that while all of the content is stored on RAID servers, the automation system software and automation database are all running on a single consumer disk drive. If that drive fails, you will be one busy maintenance guy.

Moving the database over to a stand-alone database server takes care of the database vulnerability, but what about the automation system itself? Of course, if you have the original CDs that the automation system came from, you can install a new drive and then reload the automation application. But all this takes time. A simple, low-cost solution is to buy a second drive just like the one in your system. During a maintenance period, install the drive in the automation computer, load the operating system and the automation software. Complete the configuration of the automation, and verify that everything is working properly. Shut down the system, and put the original drive back into the computer. Carefully label the backup drive, document what you did (including noting the version of the OS and automation software on the drive), and put it on the shelf. Now if the automation computer drive fails, all you need to do is install the backup drive, and you are back on the air.

Brad Gilmer is president of Gilmer & Associates, executive director of the Video Services Forum and executive director of the Advanced Media Workflow Association.

Send questions and comments to: brad.gilmer@penton.com