High-availability networks

For some time now, on-air facilities have relied on Ethernet networks. In many cases, these networks have grown organically over time. As more equipment moves to Ethernet for critical functionality, it may be time to reassess the availability of your Ethernet infrastructure.

High availability vs. high reliability

We may throw around the terms high availability and high reliability interchangeably, but they actually have different meanings when applied to systems. A system is highly available when the services it provides are available the majority of the time. High availability does not mean that components will never fail. It means that the overall system is there when you need it, even if components fail. High reliability, on the other hand, means that the components in the system are highly reliable. While the difference between these two approaches may seem identical, they have a big impact on the design philosophy and cost of your network.

The critical point is that highly reliable network equipment is expensive because it is designed not to break. Highly reliable equipment typically includes things like dual power supplies, watchdog processors and redundant disk systems. Building a network from these sorts of components can be an expensive proposition.

A highly available system may be built out of less expensive network products. These components may lack the redundant power supplies or other features of high-reliability equipment, and therefore, they may fail more often than the more expensive equipment. However, if the overall network design takes into account the fact that equipment may fail, then end users will still be able to access the network even if something goes wrong. Let's take a look at how this might be applied in a broadcast facility.

You may recognize Figure 1 from the August 2008 Broadcast Engineering article titled “Computer architectures.” It illustrates a typical station facility. Because the focus of the 2008 article was on security, the broadcast core was not shown in detail.

Figure 2 on page 26 shows more detail in the broadcast core. While there is nothing inherently wrong with this simple configuration, the loss of any single piece of network equipment will cause the automation system to either lose connectivity with other departments in the facility, or will cause the automation system components to be unable to communicate with each other. One approach to resolving this problem is to purchase highly reliable network components. With this approach, the network architecture remains simple, but the cost of components is high. Figure 3 illustrates another approach.

In this figure, critical network components are duplicated. The architecture looks simple to implement, but there are a few key points. First, note that this is not simply two networks in parallel. There are connections between the two networks at critical points. Second, the automation computers have two Ethernet ports, one connected to each broadcast core switch. Third, these switches are configured using Hot Swap Router Protocol (HSRP) or Virtual Router Redundancy Protocol.

High-availability techniques

Figure 3 shows a network architecture that takes advantage of several techniques to achieve high availability. In this example, I will cite some specific hardware and software from Cisco because that is what I am familiar with. Other manufacturers have similar technologies available. Also, the descriptions below are just a starting point. Creating high-availability networks will require some research. (See “Resources.”)

Redundant switching

The Cisco 3550 switch is a very capable device and, at around $700 on the used market, it is affordable as well. Besides providing typical switching functionality, it supports the Cisco HSRP. Using HSRP, a network engineer can configure two routers so that they appear to a client as a single virtual gateway interface with a single IP address and a single MAC address. Client computers are programmed to use the virtual gateway address, not the physical address of the gateway routers themselves. One of the core switches is automatically selected as the active router. Should that switch fail, the second switch automatically responds to requests from clients for the virtual gateway address.

Note that the two routers in the broadcast core provide redundancy for connectivity between automation computers. The two routers in the station core provide redundancy for connectivity between the broadcast core and other parts of the station.

Multiple Ethernet ports

While the detail is not shown in Figure 3, it is assumed that each automation computer has two Ethernet interfaces. In this example, one interface is connected to broadcast core router 1, and the other interface is connected to broadcast core router 2. If one interface fails, the computer is still connected to the network.

Don't assume that just because you have connected two 100BASE-T ports on the client to two separate switches that 200Mbs/s is automatically available. The interaction between the client and the switches is not straightforward. Just because there are two cables connected to two different Ethernet cards and they both have LEDs that light up green, it does not mean both of them are working. As with any backup system, test it to confirm that your client and switch are configured properly to make use of both connections.

Monitoring

Any time one puts a high availability system in place, it is important to consider how to monitor the system. Because the system is designed to continue to function even if a component fails, there may not be any outward signs that there is a problem. For this reason, it is important to monitor the health of the components in any mission-critical network. There are many network monitoring tools available. Most of these tools not only allow you to remotely monitor the health of network components, but they also provide levels of alarms and notification methods, whether by e-mail, SMS or some other technology.

How much protection?

As you consider these systems, think about how much protection you need, and whether the cost and time required to configure them really makes sense. Remember that you can always have an extra switch sitting on the shelf. Also keep in mind that while it may make sense to deploy these sorts of systems in your broadcast core, other areas of the network may not require the same level of availability.

Results

The architecture in Figure 3 is quite resilient. In this configuration, automation computers continue to function even if one of the broadcast core switches fails, one of the station core switches fails, and one of the network interfaces in the automation computer fails, all simultaneously. That is a lot of protection for relatively little expense.

Brad Gilmer is executive director of the Advanced Media Workflow Association, executive director of the Video Services Forum and president of Gilmer & Associates.

Send questions and comments to:brad.gilmer@penton.com

Resources

Visit www.cisco.com. Search for “Catalyst 3550 Multilayer Switch Software Configuration Guide.” Look for “configuring HSRP” in the table of contents.
Visit www.rfcs.org. Search for “RFC 2338, Virtual Router Redundancy Protocol (VRRP).”
Visit www.ciscopress.com.
Search for article 375501, “CCDP Self-Study: Designing High-Availability Services” from the book CCDP Self Study: Designing Cisco Network Architectures by Amir Ranjbar and Keith Hutton, Cisco Press, ISBN 1-58705-185-0, 2005.