A SAN is now a key component of most reasonably sized IT infrastructures, including those handling video production. Today's SAN technology has evolved from its humble beginnings into a mission-critical enterprise IT component. The first SANs were introduced to handle small numbers of enterprise-scale systems being connected to storage arrays a few cabinets away.
Before we get too far in our discussion, it is important to understand some basic SAN concepts. SANs support block-level access to data. Support for file systems and metadata is contained at a higher level in the operating system code or, sometimes, within a database or Java appliance. There are other types of shared storage that are file-oriented (as opposed to block-level); these are known as network attached storage (NAS) or object stores (ObS).
NAS storage often supports Common Internet File System (CIFS) and Network File System (NFS). CIFS is a Windows-centric file system, while NFS is usually reserved for UNIX-based platforms.
Both NAS and ObS use TCP/IP for communication between the host computers and the shared storage controllers. This IP traffic is usually carried over the LAN rather than a dedicated SAN.
SAN traffic between an application and the storage array is significantly more sensitive to delayed or lost data than is other LAN traffic. To prevent disruptions, the SAN is typically designed to avoid overloading. One common solution is to construct the SAN so that every one of its ports can send and receive at full speed. Unfortunately, SAN connections are expensive, and SANs are cost-constrained. SAN architects must balance cost and risk while recognizing that a mistake can result in outages.
The protocols of the SAN are:
- Fibre Channel (FC)
- Fibre Connectivity (FICON)
- Fibre Channel over Ethernet (FCoE)
- Fibre Channel over IP (FCIP)
- Internet small computer systems interface (iSCSI)
These represent enterprise-scale derivatives of the peer-to-peer small computer systems interface (SCSI) of the 1980s. The latter two, FCIP and iSCSI, are IP-encapsulated protocols and are carried over a LAN.
Storage interface and its requirements
Neither SCSI nor its derivatives (FC, FCoE) include a transport layer protocol to perform error correction and retransmission functions. Therefore, they object strongly to data loss. Additionally, SCSI and its derivatives assume a direct peer-to-peer connection between the initiator and the target, so end devices, and the applications that use them, do not tolerate latency and delays well.
FC is implemented at the same protocol level as Ethernet, but unlike Ethernet, it is designed as a lossless protocol with quite predictable latency characteristics. FC was designed to support the sensitive application-to-storage interface by maximizing throughput and reliability.
Also, FCIP and iSCSI have no mechanism to handle LAN overloading. As a result, they are prone to delays and frame loss. These solutions are not suitable in larger environments where engineered levels of performance and resilience are required. Their main benefit is that they can be implemented at low cost on an existing LAN.
FCoE was created to enable network convergence. The result allows delay- and loss-sensitive data flows to share the same physical network as normal LAN traffic. As a result, FCoE is implemented at the same protocol level as FC and Ethernet and uses smart switches to ensure that the FCoE traffic is shaped to provide good application and storage performance.
Enterprise-scale SAN design
SANs are carefully designed to avoid frame loss, latency and delay by integrating nonblocking switch fabrics and a judicious fan-out ratio between the hierarchies of the connected host, edge-switch ports, the interswitch links (ISL) to the core switches and the connections to the storage array. SAN designs typically assume that all hosts may be communicating at full line rate at the same time, implying significantly over-provisioned SAN capacity
Physical layer over-provisioning is compounded when implementing complex, multitier, edge-core-edge switch designs for large-scale environments. While intended to provide physical reconfiguration flexibility to accommodate growth, these designs actually introduce substantial additional cost, complexity and resultant risk.
SANs are typically designed around a fan-out ratio between host connections and storage array ports of 6:1 to 12:1, depending on the estimated intensity of the storage activity. Low-utilization hosts can be theoretically supported at the upper fan-out ratios, while highly utilized hosts require much lower fan-out ratios. (See Figure 1.)
SAN design in the cloud
Good SAN design involves balancing high-intensity hosts and low-intensity hosts on the same edge switch to maximize switch use. Choosing the correct fan-out ratio is a difficult enough decision at the initial implementation stage, but it becomes difficult to maintain in a mature and growing SAN. It becomes time-consuming to manage with a virtualized workload that will be automatically and transparently moving between physical hosts and their associated SAN connections in real time.
In a virtualized world, applications are untethered from the underlying physical hardware, the same hardware that has physical network connections to the SAN. For example, a system could enable the migration of an entire running virtual machine, application and all, from one physical host and physical storage system to another seamlessly and without user interruption.
Cloud computing offers many benefits to the enterprise, obscuring the physical complexity of servers, storage and networks from applications. It enables rapid deployment and enhanced availability through outages. Cloud does not, however, offer a free lunch. An inadequate underlying SAN infrastructure will quickly be exposed as more and more workload is virtualized and automated tools move critical business applications between nodes on increasingly larger clusters of servers and storage.
The cloud not only insulates and obscures the physical complexity of the underlying hardware from applications, it has the potential to obscure the cause of an outage. This may increase the time it takes for engineers to resolve, and it may reduce the success of initial diagnostics. When using cloud-based storage, it is not unusual for a fairly simple fault that can be repaired in a few minutes to take many hours to diagnose and only after much technical hand-off between multiple support teams.
The choice is stark: Overengineer, pay the price and hope for the best, or adopt a more scientific approach and manage the SAN proactively.
Steve O'Donnell is managing director EMEA and a senior analyst at Enterprise Strategy Group.
Editor's note: This feature is based on a September 2009 article, “Storage connectivity and why it is important,” on The Hot Aisle, the official blog of Data Centre Solutions.