The Elements of Grid Computing for Storage

(click thumbnail)"STAR" Storage Configuration (top): More conventional arrangement requires two 16-port switches, one of which is for failover. Thirty-two wired connections to two expensive switches with two paths to any node with one additional level of protection only.

"GRID" Storage Configuration (bottom): More resilient, less chance of catastrophic failure. Nodes formed as four groups of four, connected to less expensive four-port switches. Same number of wired connections, with each node connected to two switches. Failure of any switch retains additional paths to connect to any two nodes. Multiple levels of protection available.Server clustering, when two or more servers collectively increase computation performance and provide failover protection, is yet another principle in computing technologies. Clustering allows a group of servers-individually called "nodes"-to appear as a single entity, with any one server capable of picking up where any other server left off during a failure. Mission-critical applications, as in transactional operations or SQL-databases, use clustering to ensure that little or no downtime occurs.

Sometimes called "virtualization," clustering gives the appearance of one server when in actuality there are many in the total system. In the video server domain, mirroring of either online storage or servers, or both, might be construed as clustering, but is really just parallel redundancy. Broadcast automation databases, traffic and billing, and command and control systems may also employ parallel or redundant servers, and are less likely to be operated in a true clustering node configuration.

Multi-element computing, called "grid" computing, is a form of distributed computing, which coordinates and shares resources across dynamic and geographically dispersed entities. The grid may involve application, data, storage or network resources. The principal concept in this computational model is that by combining multiple resources, higher throughput is achievable. Networking several computational platforms together creates a virtual computer architecture that can tackle complex problems. The computers may be in close proximity to each other, or scattered about geographically.

Grid computing can scale from very simple to extremely complex. In its simpler form, grid computing harnesses many small chunks of CPU idle time from many computers, then collectively assigns each CPU a tiny element of a large computationally intense problem. These parallel processing applications combine many small computational elements and in turn, generate a much larger set of processing power.

A node in a mesh of grid computers may be as simple as the common home PC, which typically has a huge abundance of idle time. One can see how much idle time occurs in the typical PC by opening the Windows Task Manager Processes tab and look for the System Idle Processes. Take your hands off the keyboard for a few seconds and watch the idle percentage soar to almost 100 percent.

Now consider the thousands of PCs in the world, and sample even a tiny percentage of each ones' idle time, apply it to a parallel computational effort, and you have the potential for a very large supercomputer that costs literally nothing to operate.

WE ARE NOT ALONE

One widely known non-profit project, called "SETI@home" (Search for Extraterrestrial Intelligence), whose mission is "...to explore, understand and explain the origin, nature and prevalence of life in the universe," is an example of a grid computing project that uses personal computer CPUs to scavenge for scientific purposes. The project began 40 years ago, and continues en masse today.

SETI was not the first arrangement of its kind, nor does it use the capabilities of current grid computing systems; many others have employed the concepts to do drug research for cancer, mathematics and even climate modelling. Home users can search for SETI@ home, and install the application on their own PCs to experience how grid computing works-and it makes for a great screen saver, too.

So why the emphasis on clustering and grid computing for media applications? It is actually quite simple. Historically, the video server has followed nearly in lock step with modern computational, processing or storage efforts, from real-time operating systems through NAS and SAN storage. What occurs in the development of network or data-centric computing seems to eventually show up in the media server domain. Grid computing is now headed toward storage for media applications.

SANs and NAS-head storage systems grow universally in deployment, making storage management, data protection and failover/failback recovery far more important. As more assets are placed on spinning disks, the driving forces for greater capabilities must be addressed. Gone are the simple incremental additions to storage platforms, i.e., adding another five or 10 disks to an existing array, changing from a 36 to a 73 GB drive or adding another chassis of disk drives. The industry now demands increased bandwidth with higher accessibility, lower cost and much larger capacities. This happens mostly by upping the storage throughput or increasing the amount of storage.

Enter clustered or grid storage. Applying similar principles to those found in clustered servers or grid computing, but now to storage, allows the user to easily and linearly add storage while increasing bandwidth (i.e., throughput). File system complexities are diminished by moving away from the traditional layered file system approach and allowing the storage to appear as a single volume.

By using non-high performance drive media (i.e., SATA), as opposed to Fibre Channel drives, the cost to performance ratio is readjusted. As the industry realizes that having everything online is unnecessary or too costly, nearline or secondary storage uses are accelerated.

Grid storage promotes distribution of file elements across many sets of drives. Files are literally sliced up and spread across a larger set of less costly drives. The requirement for a RAID-protected system is mitigated by spreading the risk out over a larger footprint of storage. For protection purposes and increased accessibility, files should be replicated in multiple locations across the storage platform.

With a common file system manager at each node of storage, tracking the location of these file slices no longer becomes the charge of the video server-thus system performance is increased and storage management becomes a separate task. Grids allow for unlimited size and performance scalability.

IP-based connectivity is another driving force for emerging storage platforms. With 10 Gigabit Ethernet, the previous problems of FTP latency are nullified, and the overall storage platforms no longer need to depend on complex fabric-managed switches, such as those in a Fibre Channel implementation.

The gains include increased storage, easy expansion, cost-effective entry levels, and a manageable storage system that uses the well-understood capabilities of Ethernet.

NAB showed just such storage capabilities from well-known established vendors, with a variety of applications, some standalone and others integrated into video server platforms. With these various storage platforms, which may or may not depend on a specific server vendor's product; options are available today that were impractical only a few years ago-options which continue to promote the all server/tapeless operating environment.