Storage area networks

Storage area networks (SANs) allow multiple processors to share a pool of common storage. It is a powerful solution for sharing content across a homogeneous system. A SAN is a pool of virtual storage, which is treated by a device as if all the storage were local. SANs can be comprised of local storage on a number of machines, centralized storage or a combination of both. Unlike a traditional network, SANs do not involve file transfer. When users gain access to a shared piece of content, they are not making a copy of the content on their local storage. Nor are they attaching a file system to their local box. To the user, it appears as if the content resides on their local system regardless of where the content is physically stored. Furthermore, SANs operate separately from a local area network (LAN), so storage-related functions do not slow normal LAN traffic.

SAN assembly The SAN consists of three basic components: an interface (like SCSI, Fibre Channel or ESCON), interconnects (switches, gateways, routers or hubs), and a protocol (like IP or SCSI) that controls traffic over the access paths that connect the nodes. These three components plus the attached storage devices and servers form an independent storage area network.

While the SAN supports a number of interfaces, Fibre Channel (both Fibre Channel Arbitrated Loop [FC-AL] and Fibre Channel fabrics) has gained the limelight due to its flexibility, high throughput, inherent fault-tolerant access paths and potential for revitalizing network-to-storage communications. In most, but not all SAN implementations, Fibre Channel serves as a sort of shared "SCSI extender," allowing local systems to treat remotely located storage as a local SCSI device.

Figure 1 shows a simple SAN system. Note that each video server has inputs, outputs, local storage and a SAN connection. Content stored on the local drive is accessible not just to users on the local node but to all users on the SAN. An item stored on System A is just as available to a user of that system as it is to a user on System B. A user on System B does not have to copy content from System A before it can be used.

Most of us are familiar with the Windows operating systems. When users want to access content stored on a different computer across a network, the remote computer must first be found on the network, a username and password must be typed, then the drive and the desired file must be located. At that point, the desired file must be copied to the local drive. As an extension to this idea, both Windows and UNIX support mounting remote file systems. Once a remote file system is mounted locally, this storage appears to be part of the local file. Remote file system technology has been around for a long time and is well understood. It operates well across public networks. However, it includes overhead that may not be appropriate for rich media applications at high bit rates operating across dedicated networks.

SAN concerns SANs raise some interesting issues. For example, if SAN data is not stored in a particular location, how can it be adequately backed up? Some SAN solutions automatically create two copies of any newly ingested material. The system makes sure that the same data is not stored in the same location. Other SAN systems stripe the data across multiple systems. If one server's local storage becomes unavailable the SAN recreates the data using parity algorithms. In any case, it is important that users understand how their SAN data is protected and what the cost of this protection is in terms of disk storage.

Another issue in SAN design is bandwidth. How do SAN systems ensure that the SAN does not fall over if all users on the SAN request data simultaneously? First, they design the bandwidth of the SAN so that it has extra capacity. The extra capacity assures that the SAN keeps functioning even in times of extremely high demand. Some might argue that this is wasteful and drives up cost. The fact is that high-speed network hardware is falling in price, and is now such a small part of the total system price that this is no longer a consideration. Second, they insist that connections to the SAN be carefully controlled. If the SAN is grown in an unplanned way, the overall bandwidth of the SAN could be exceeded under peak conditions.

If SAN is so great, why doesn't everyone use it? SAN has a few challenges. First, SAN does not work well in a multivendor environment. If you are looking to mix vendors' equipment, you are better off using network-attached storage, or some other Internet protocol-based system. The standards for SAN do not address the issue of file interchange. Just because you can transfer a file from one vendor to another does not mean you can play it. For example, you might be able to create an MPEG file that cannot be decoded by another system. For this reason, SANs exist pretty much as single vendor solutions. Also, as previously discussed, most vendors are nervous about connecting to other SANs because of concerns about bandwidth demands.

Second, SANs may turn out to be more expensive when you are looking for a server system that has a low number of I/O channels but lots of storage. Finally, SANs may not be the way to go if you are looking for a small system. Generally, simple, stand-alone systems are less expensive.

SANs are best used in larger systems where users want many I/O channels and they all want to access the same content. As storage prices fall, building one server with a huge amount of storage is not a problem. However, I/O still requires bandwidth inside the server. There are two common strategies for dealing with large I/O requirements. One is to build a large server with what amounts to a router inside it. The other is to connect a number of smaller I/O devices to a network. That is what a SAN does. It allows one to grow a network efficiently without having to upgrade boxes.