Understanding Storage Resiliency

Storage resiliency can be as mysterious as the topics of fault tolerance and virtualization—each having different meanings based upon varying contexts and diverse applications. For a storage system to attain a high degree of resilience, systems must be both fault tolerant and provide for a variety of data management techniques that may also include virtualization. Resiliency technologies may be applied to the disk arrays, clusters, storage area networks (SANS), tape libraries, and even servers or computers.

Resilience is defined as "being capable of withstanding shock without permanent deformation or rupture" and "tending to recover from or adjust easily to misfortune or change." When a storage system is designed to protect and correct for anomalies that occur when moving, replicating or manipulating that data in routine operations or workflow—it is said to be "resilient."

The challenge of building resilient storage systems continues to develop. Six-year old Facebook has more than 500 million active users, 50 percent of who log on every day, collectively spending over 700 billion minutes per month online (source: Facebook.com, Nov. 23, 2010). This platform might be one, if not the most, diverse active public information systems on the planet. Facebook's systems must be resilient to attacks, responsive to all users, and be fault tolerant to potential thwarts at all levels at all times. This clearly defines the penultimate resiliency design with flexibility and scalability that adapts to changes and demands without failure.

Data has varying degrees of value-based- upon user needs—its importance is not uniform across information technology domains. However, once data becomes information, it takes on meaning and value; and must now be both accessible and resilient to any influence. Most information will generally be treated equally in the way it is managed and stored. There can be conditions where data will be at risk, which establishes one goal for IT professionals: reducing the risks associated with data storage and management accomplished by carefully analyzing available storage management technologies.

FAMILIAR RESILIENCY

Storage resiliency for data protection is usually found by employing RAID. For workstation environments, mirroring (RAID 1) is the easiest, least costly and is a highly dependable form of resiliency. Its drawback is the 50 percent reduction in storage capacity necessary to fully duplicate the data.

Higher bandwidth performance arrays will use striping either as single parity (RAID 5) or dual parity (RAID 6).

Parity based storage is used in videoservers, editing and central storage repositories. Alternatives include hybrid RAID sets where combinations (e.g., RAID 53 or 61) employ mixes of dedicated parity drives (RAID 3) and distributed parity (RAID 5 or 6).

Fig. 1: Applications for multi-tiered storage with their respective levels of resiliency Choosing a RAID configuration depends upon your application as each RAID type presents compromises in write versus read performance or in overall capacity.

STORAGE TIERING

Storage management processes are supported by applications including automation, production asset management (PAM) and enterprise-wide media asset management (MAM). Assigning data to an appropriate type of storage is one method for achieving resiliency in a system. Generally speaking, only active or dynamic data is kept on primary storage. Fixed or static data, known as persistent (i.e., non-transactional or post-transactional) data, should not live on primary storage. User accessibility to active or dynamic data is increased when persistent data is moved away from primary and onto secondary storage.

There are no "standardized" industry schemes used in segmenting, classifying or numerically assigning tiers; but there are some guidelines to storage tiering that an organization can employ. The common ground is simply a two-tiered approach where active data is kept on high performance drives and persistent data is kept on less expensive "near line" types of storage—including tape archives. Some organizations may find two tiers insufficient and will further divide storage in multiple tiers providing for more efficient management based upon specific needs, security and accessibility.

Fig. 2: Examples of multi-tiered storage with their users and functions When multi-tiering is the objective, storage solutions move into a configuration referred to as a "highly resilient storage environment." Here data is mapped to different storage management elements called "service levels." Assigning data to a service level is based upon the data's value to the enterprise relative to each working environment and each workflow. For example, a tiered architecture might segment data into four tiers; the higher the tier, the higher the level of resiliency. As service levels move to the lower tiers, resiliency options can be selected appropriately, (see Fig. 1).

Regardless of the tier level, even for the lowest tiers, there are inherent cost-effective and value driven features available that can increase storage resiliency for those particular purposes. Fig. 2 depicts the various components of a multi-tier storage configuration.

HIERARCHICAL STORAGE MANAGEMENT

Automating storage management tasks helps support accessibility, data preservation and resiliency. Hierarchical storage management (HSM) automatically moves data from expensive storage to less costly storage based upon usage or demand for those assets, and upon historical trends or policies set by storage administrators.

HSM manages and distributes data between two or more storage tiers. HSM are often embedded into MAM-systems where the structural configurations allow maximum accessibility while minimizing storage costs. HSM may be integrated into storage solutions including near line and archive; and deployed in disk-to-disk (D2D), disk-to-tape (D2T) or disk-to-disk-to-tape (D2D2T) environments. Embedded applications include data migration, archive and storage management residing inside a complete asset management system (ASM); or they may be found at the application layer of a dedicated storage management solution.

Design and planning for resiliency goes much deeper. In a future installment we'll look at the processes known as "data recovery." These processes do not necessarily mean the reconstruction of lost or corrupted data; but instead relate to tuning systems so users can retain efficient access to their data as the storage tier or service levels get deeper or more complex.

Karl Paulsen (CPBE) is a SMPTE Fellow and a technology consultant to the digital media and entertainment industry with Diversified Systems. Contact him at kpaulsen@divsystems.com.