Keep the lights on when things go wrong.
Almost every media company has a requirement to keep the lights on when things go wrong. Most companies have a written plan that describes specific steps to take when a disruptive event occurs. These plans used to be called disaster recovery plans, but most have been renamed business continuation plans because a) not all outages are caused by a disaster, and b) the end goal is not to recover from a disaster, but to continue business operations regardless of what may come your way. This month, we will look at the critical components of a business continuation plan.
A comprehensive plan will first lay out the scope, and then also identify what is out of scope. The plan will also describe its limits and the reasons for those. As professional media technologists, it is our natural inclination to focus on failures of the technical facilities under our control. However, I strongly encourage you to think outside the box and to involve people from other departments when you develop your plan. These experts can point out critical areas that you might overlook.
Ideally, top management should be involved in setting the overall scope of the plan and should determine, at a high level, what is not covered. After all, they are the customer. Should a disruption occur, they will have to accept the consequences of implementing the plan.
There is another reason to involve top management right at the beginning: Ultimately, any plan involves a tradeoff between risk and cost. Management will have to make some tough decisions about how much money they are willing to spend in this area. At some point, someone will have to decide that the cost to protect against some unlikely event is too much and that the company is willing accept a risk.
To have a solid plan, you must have a good foundation. You may want to start with a few basic assumptions. For example, you may establish that:
We will rate different areas of our business and assign different levels of protection to these areas based on these ratings.
We will not protect for two (or three, or four?) simultaneous failures.
We will include geographic diversity in our plan because our primary location is in an area that is subjected to tornado (earthquake, flood, etc.).
These are just some examples of statements you might want to make relative to your own facilities. You will need to develop your own.
As you develop your plan, you should consider areas not normally within our area of responsibility. For example, we probably will consider a failure of electrical power in our plan, but would we consider an extended failure of our heating or cooling facilities? Some people would, but some would not. Professional media equipment may be concentrated in a technical area, and a failure of cooling may cause temperatures to rise quickly.
There may be other areas you might want to consider. Does your facility rely on pumps, fans or other mechanical devices to keep the facility on the air? If so, what steps can be taken to deal with a failure (or multiple failures) of these devices? What public utilities beyond electricity does your facility require in order to continue to operate? If you are in an area that becomes particularly cold in the winter, do you have alternative heat sources available if you are dependent on natural gas for heating? Are keyless access systems set to fail in an unlocked or a locked mode? If the batteries on the system run down, will you be able to get access to critical areas? Will you need to post guards to prevent entry into areas that are normally secured by these systems? Is your facility located in an area that is subject to flooding, especially if pumps or flood gates operated by the local municipality fail?
You get the idea. Be sure to widen your thinking and include others in your planning.
Another important aspect of business continuation planning is setting a time horizon. To some extent, this overlaps with setting the overall scope. For example, you may want to plan for the possibility that your core facility becomes inaccessible for one week due to ice and snow. You may also decide that you will plan for the total destruction of the facility due to fire or other natural disaster. But, you may decide that you are not going to plan for two months of inaccessibility, because how you respond would depend greatly on the cause of the disruption.
Once you have developed your plan, it is a good idea to test it. For example, if you have backup HVAC systems and an automatic change-over, you should fail your main system to see if the backup comes online as anticipated. You should fail power to the control system but keep alive power to air handlers, fans and other controls. Try to anticipate failures and to test various scenarios.
Of course, failing a main HVAC unit is fairly low risk. If the backup does not work, you probably have plenty of time to get the main back online before things overheat. It may take planning and real guts (and support from management) to practice other scenarios. For example, you may find that disconnecting a main backbone path on your core router causes failures of DNS or other critical services. Frankly, some of these unanticipated failures might impact air. But, this is exactly the point of testing your plan. It is better to test the plan at a time when you are prepared to deal with the consequences rather than during an actual event when support staff may not be readily available. Again, top management should be involved and should support the idea of testing your business continuation plan.
Finally, bear in mind that almost from the moment you begin business continuation planning, your facility and your business are changing. This means you should think about how and when your plan is revised. There is nothing worse than having a false sense of security in thinking you have a plan, only to find out it is five years out of date, and half of it either does not apply, or worse, causes further disruption. Business continuity planning is not a one-time event; it is a commitment to an ongoing process.
Cartoon Network gas
Some time ago, when I was working at Turner Broadcasting System, I had an office directly behind Cartoon Network master control. Being in a secure technical area, my office had no windows. Normally I kept my office door open, but occasionally when I had to make a phone call, I closed the door in order to keep from being distracted by the antics of Bugs Bunny and Space Ghost.
One day, I had the door closed as I worked away on a project. I heard a loud knock, and a fireman in full rescue gear wearing a Scott Pack air rig and full face mask opened my door. He told me there was a natural gas leak (he had to repeat several times through the mask), and insisted that I leave the area immediately. It turns out that I was the last one to leave the building. As I walked past Cartoon Network master control, it was rather eerie to see programming playing away without a person in sight. Fortunately, adequate business continuation planning and implementation allowed the network to continue uninterrupted.
Brad Gilmer is executive director of the Advanced Media Workflow Association, executive director of the Video Services Forum, and President of Gilmer & Associates.
Send questions and comments to: firstname.lastname@example.org