Storage area networking

SANs had a troubled start, largely because they were based upon technology and standards that needed some wrinkles ironed out.

If the SAN is grown in an unplanned way, its overall bandwidth can be exceeded under peak conditions.

Storage area networking is one of the hottest topics there is. Storage area networks (SAN) promise to increase creativity and productivity by allowing users to share content across multiple systems. At its heart, a SAN is a pool of common virtual storage that multiple processors can share as if all the storage were local. A SAN can consist of local storage on a number of machines, centralized storage or a combination of both. Unlike a traditional network, a SAN does not involve file transfer. When users gain access to shared content, they are not making a copy of the content on their local storage, nor are they attaching a file system to their local box. To the user, it appears as if the content resides on their local system, regardless of where the content is physically stored. Furthermore, the SAN operates separately from a local area network (LAN), so storage-related functions do not slow normal “network” LAN traffic.

Unfortunately, the SAN had a somewhat troubled start, largely because it is based upon technology and standards that needed some “wrinkles” ironed out. While there are still some compatibility issues, these problems are much better understood.

The SAN consists of three basic components: an interface (like SCSI, Fibre Channel or ESCON), interconnects (switches, gateways, routers or hubs), and a protocol (like IP or SCSI) that controls traffic over the access paths that connect the nodes. These three components plus the attached storage devices and servers form an independent storage area network.

While the SAN supports a number of interfaces, Fibre Channel (both Fibre Channel Arbitrated Loop [FC-AL] and Fibre Channel fabrics) has gained the limelight due to its flexibility, high throughput, inherent fault-tolerant access paths and potential for revitalizing network-to-storage communications. In most, but not all, SAN implementations, Fibre Channel serves as a sort of shared “SCSI extender” allowing local systems to treat remotely located storage as a local SCSI device.

An important issue to consider in designing SAN systems is fault tolerance vs. high availability. Because you are using the same storage system for a number of clients, a failure in the storage system can have serious effects in your organization. Fault-tolerant systems are designed to be resistant to faults, so a single fault will not typically cause a total system failure. Dual power supplies, redundant disks or RAID, dual disk controllers and automatic changeover software are some of the design components typically included in fault-tolerant systems. These systems are designed as a single unit or a set of interconnected units. They are sold as a system and may be expensive. Many fault-tolerant systems are designed so that the only way you know there has been a failure is by checking status monitoring and alarms. The hardware and software quality-control procedures on fault-tolerant systems can be extensive.

One thing to be aware of is that fault-tolerant systems typically come with 24-hour support that is geared for the IT and business world. This support can be costly, but it can be very thorough, with capabilities such as “phone-home” monitoring. With phone-home monitoring, the system can call into the support center when errors are detected, and the support center can call into the system for diagnostics and upgrades.

Another approach that can be much more economical, but which may or may not provide the same protection from failure, is that of high availability. With high availability, the strategy is different. The point is not to prevent failures, although high-quality components can be used. Instead, a designer uses off-the-shelf components to design a system such that a single failure has little impact. An example might be to design a network with two completely separate Ethernet systems. The servers and clients might have two Ethernet cards in them instead of one. It might seem cumbersome to put together two completely separate Ethernet networks, but high availability takes advantage of the low price of consumer computer hardware. Ethernet is practically free these days unless you are talking about the really high-speed technology.

Do not get me wrong — high-availability systems are typically also very well engineered. They can provide excellent recovery from faults and may provide a lower overall system cost than fault-tolerant systems. High-availability systems may have a higher fault rate than fault-tolerant systems, although this depends entirely on decisions made by the system designers. The bottom line? Fault-tolerant systems may indeed be more “fault tolerant” than high-availability systems, but there is a cost associated with this. It is up to the user to decide if it is worth the expense.

The decision between fault tolerant and high availability may be as much philosophical as it is economic or technical. Some users feel much more comfortable with systems that are designed as a whole and have IT-type support. Others feel more comfortable with systems built out of readily available components that they can easily see and understand. When considering SAN systems and the issue of reliability, be sure to think about your philosophy and buy the appropriate solution.

As I said before, SANs allow multiple users to share the same data. With a SAN, you can have one remote storage location, or several locations, including local disk. SANs raise some interesting design issues. For example, if SAN data is not stored in a particular location, how can it be adequately backed up? Some SAN solutions automatically create two copies of any newly ingested material. The system makes sure that the same data is not stored in the same location. Other SAN systems stripe the data across multiple systems. If one server's local storage becomes unavailable, the SAN recreates the data using well-understood parity algorithms. In any case, it is important that users understand how their SAN data is protected and what the cost of this protection is in terms of disk storage.

Another issue in SAN design is bandwidth. How do designers of SAN systems ensure that the SAN does not fall over if all users on the SAN request data simultaneously? Simple — first, they design the bandwidth of the SAN so that it has extra capacity. The extra capacity assures that the SAN keeps functioning even in times of extremely high demand. Some might argue that this is wasteful and drives up cost. The fact is that high-speed network hardware is falling in price and is now such a small part of the total system price that this is no longer a consideration. Second, they insist that connections to the SAN be carefully controlled. If the SAN is grown in an unplanned way, its overall bandwidth can be exceeded under peak conditions.

If a SAN is so great, why doesn't everyone use one? Storage area networking has a few challenges. First, it does not work well in a multivendor environment. Why? Because the standards for SANs do not address the issue of file interchange. Just because you can transfer a file from one vendor to another does not mean you can play it. SANs exist pretty much as single-vendor solutions.

Second, SANs may turn out to be more expensive when you are looking for a server system that has a low number of I/O channels but lots of storage. Finally, a SAN may not be the way to go if you are looking for a small system. Generally, simple stand-alone systems are less expensive.

So where do SANs make the most sense? SANs are best used in larger systems where users want many I/O channels and they all want to access the same content. As storage prices fall, building one server with a huge amount of storage is not a problem. However, I/O still requires bandwidth inside the server. There are two common strategies for dealing with large I/O requirements. One is to build a large server with what amounts to a router inside it; the other is to connect a number of smaller I/O devices to a network. That is what the SAN does. It allows one to grow a network efficiently, without having to upgrade boxes.

Brad Gilmer is executive director of the AAF Association, technical moderator for the Video Services Forum and president of Gilmer & Associates, a technology consulting company.