Selecting Mass Storage
June 1, 2011
The choices available in mass storage systems have proliferated to proportions that boggle the mind.
Gone are those early days of videoservers, where the storage solutions were controlled solely by the manufacturer of the videoserver itself. In this era of 15,000 RPM serial disk drive systems, including SAS, SATA and the like; the selections available are still controlled primarily by the videoserver manufacturer. However, the variations in storage architectures offer dimensions that scale both vertically, in terms of capacity and horizontally, in terms of bandwidth or throughput.
Users must now determine their storage selections in the perspectives which fit their intended uses and for those applications that the storage must support. The optimal balance for that solution fits inside the triangle (see Fig. 1), which shows the relations of where RAID storage fits in terms of cost, availability and performance. On the left is a single drive solution, where the choice sits dead center in the triangle, a balance between those factors at the edge of the triangle. For RAID protected combinations (on the right), one can see the tradeoffs between the levels associated with those drive sets and what the expectations of the system, budget and specific thresholds of performance and availability are for each form of RAID protection.
Take for example RAID 0, which is essentially unprotected direct attached storage ranging from two to many hard disks that scatter data across the entire set of drives without regard to protection beyond what the disk controller itself provides. This solution provides for the optimal balance of cost and performance; yet for data availability RAID 0 is the poorest performer. The added risk is that if any one drive fails in the set, all the data is lost forever.
Fig. 1: Shows the relations of where RA ID storage fits in terms of cost, availability and performance.
The RAID 1 solution, also known as mirroring, provides for the highest data availability, with a reasonable amount of input/output performance, yet it carries with it a cost penalty greater than the other solutions shown by RAID 4 or RAID 5. Media centric storage systems need to use a RAID solution that gives the best balance of I/O and availability when using enterprise class disk drives which are already hand selected for optimal life and duty cycle performance.
You can see from the storage triangle that RAID 5 is currently the best of the various RAID set offerings in this scenario. It should be noted that RAID 6 also lands nearly in the same zone as RAID 5, but carries with it the additional cost penalty of a second redundancy drive aimed at protecting against the failure of up to two drives in a logical unit number (LUN).
All this analysis fits neatly into the realm of resiliency, which is the ability for a system to continue to perform or to return to a reasonable state following an abnormal event such as a drive failure, interconnection anomaly, or another uncontrollable factor. In trying to determine the most appropriate storage system for the application to be satisfied, the storage architecture chosen may vary based upon the use for which that storage type is employed in.
As an example, the continuous recording of a real-time linear video feed does not require a high performance solution such as found in RAID 5. Still, this application may warrant the protective nature found in mirrored drives (RAID 1) so as to mitigate the impact of one set of drives failing during that once in a lifetime recording.
In applications where multiple users must access storage in a random and simultaneous fashion, whereby high bandwidth and multiple I/O operations occur continuously—then a drive system with RAID 5 (or RAID 6) becomes essential. Here the costs of the additional drives, controllers and such are offset by the ability for multiple operations to happen without delays in data delivery occurring on an unpredictable basis.
When one builds out a storage solution that addresses near line or short term archiving, they may go to commodity storage subsystems connected to enterprise class IT hardware including servers, switches and storage controllers. In this situation the guidelines prescribed help to underscore the drive protection strategies necessary to ensure the server/storage solution matches the needs of the application which it supports. Beyond these fundamental RAID-level protection and performance guidelines, the storage components can be selected in terms of hard disk drive rotational speed, read/write latency and drive type classification, i.e., SATA, Fibre Channel or SAS.
Scalability becomes another remaining factor in the storage architecture selection process. For example, when future growth in capacity is necessary but increased bandwidth is not (referred to as vertical scaling) then be certain that those initial decisions made with regard to the specific RAID level will not impact the scalability curve going forward.
Selecting a storage architecture for use in a high performance system is still best left to the professionals who outfit storage components for media centric operations on a regular basis. However, when buying smaller storage subsystems for isolated production islands, these guidelines may help prevent a forklift upgrade as becoming the only available alternative when seeking additional capacity or improved performance in your systems.
Karl Paulsen is a SMPTE Fellow and digital media technologist with Diversified Systems. Contact him at firstname.lastname@example.org.