Grids: The future is here

The future arrived faster than we expected. Compression algorithms powerful enough to enable full HDTV were thought to be many years away during the 1980s. Yet General Instrument succeeded in compressing a full bandwidth HD signal into a 6MHz channel. This laid the foundation for a completely digital TV transmission system and when coupled with MPEG compression, gave birth to the ATSC standard.

Similarly, grid computing has been an emerging technology since the mid 1990s. The media network and application layers in a BOC, especially a large one, can be thought of as a kind of grid architecture. In fact, with networks, storage and compute applications being widely distributed, and with the need to access all content from every location, the BOC is an ideal area to invoke grid computing concepts.

What is a grid?

A collection of networked IT resources that strives to maximize the use of resources is a design goal of grid architecture. By distributing these resources, often referred to as “nodes” or “engines” resources can be balanced and bottlenecks avoided.

Fundamental grid components include:

  • Computation
  • Storage
  • Scheduling
  • Security

Large supercomputers (such as the Cray machines) from an earlier era used grid-like techniques at a time when resources were scarce and speeds limited. Today, with ever increasing system speeds and storage capacity, grid computing is poised to become commonplace.

Parallel processing

The first real attempt at developing grid like architectures was motivated by the need to improve computational performance. Medical imaging and nuclear explosion simulation contain a large number of data points and require machine computational cycles that can approach a googolplex to reach a solution.

By using parallel processing, computational tasks are broken down into smaller pieces that can be done at the same time. The results are then combined to solve the problem. An early implementation was the use of an 80287 math co-processor in 80286 PCs, where floating point operations that were not native to the CPU were done in parallel on the 80287.

A very simple example of parallel computing is the solving of the equation X = 2 x 3 + 4 x 5. The terms are grouped by calculation precedence, that is, multiplication is done before addition. Hence 2 x 3 and 4 x 5 will be calculated first, and then added together. In a linear processing environment, 2 x 3 would be calculated and stored then 4 x 5 would be calculated. The stored result and current result would be added for the solution for X.

In a parallel implementation, 2 x 3 and 4 x 5 would be evaluated on individual processors and calculated at the same time. The results would then be summed for the solution.

Now apply this concept to rendering an animation sequence. Each frame is sent to a compute node for rendering. A 30-node render farm can process 30 frames in parallel. The resultant frames are then placed in the correct order for play out. Theoretically, this will decrease render time by a factor of 30. If a render takes 60 minutes on one machine, it may approach 2 minutes on a 30-node farm. Although in practice the theoretical limit may not be reached, the computationally intensive process of parallel computing will significantly shorten processing time.

Storage

The key concept in applying grid technologies to storage is the application of a single file system throughout the data grid. Physical storage is virtualized. This means that, to any device trying to access storage, the type of storage (eg. iSCSI, NAS or tape) and the location (directly attached, SAN or across the country) is transparent to the application. With virtualization techniques capacity, sharing and availability are increased.

Applying grid concepts to storage opens up new possibilities. Speed and redundancy can be increased by using a file slicing technique. Files are broken down into smaller slices and distributed about the grid. Copies of these slices are also distributed strategically such that the same slice is never on the same storage node. Numerous copies of each slice can be made.

This technique improves access time because blocks of the original file can be pulled across the network using different routes in parallel and assembled. This takes less time than a linear file transfer. In the event of a bad block of data, a copy exists elsewhere that can be transparently retrieved. This improves data reliability and avoids the time consuming rebuilds of RAID implementations.

Management

Parallel processing and virtualized storage requires careful management of system resources. What good is breaking down a complex calculation into parallel tasks, executed separately, if the gains in execution speed are lost while waiting to combine results? Complexity will have been increased with no appreciable gain in performance.

Hence task and resource management is at the core of grid infrastructure implementation. Tasks must be delineated and scheduled. This is done by assigning individual tasks to discreet resources for execution. Queues are built for task assignment, generally on a priority basis. A real-time assembly of an ad-hoc lower third that is needed to take to air will be executed before an archive process that removes unneeded content from a server.

Storage must be managed intelligently as well. Appropriate management of storage based on the probability of using content must be considered. That ad-hoc lower third will access graphic elements that are stored near-line on high availability storage. The archive process will migrate files to a robotic tape system.

Security

With all content available to anyone at all times, security plays a major role in any grid implementation. Data may be sensitive or valuable. Encryption, authentication and authorization are used to control user access to applications, resources and data.

Grids are particularly vulnerable to viruses and Trojan Horses. This is because machines on the grid execute programs. Therefore it is of utmost importance to maintain a trusted environment throughout the grid by carefully controlling access and use of all resources.

Related linksThe TeraGrid
The National Science Foundation is funding a five-year initiative to operate and enhance TeraGrid, a distributed national infrastructure. The system collectively provides 15 Trillion Floating Point Operations per Second capability with storage approaching 1Petabyte. The Chicago/Los Angeles backbone is 40Gb/s!

Read a TeraGrid Primer.

The Globus Tool Kit is also a good resource used by member machines.

The future is now

Grid computing techniques have been used in render farms for years. Taking grid technology to the next level, at NAB this year, Omneon unveiled its MediaGrid, a ready to implement grid solution, tailored to the demands of media networks in a broadcast environment.

The system combines grid storage with grid computing. “Intelligent” interconnected nodes act independently. ContentDirectors and ContentServers are the main system components.

ContentDirectors manage data distribution, maintain data maps and are the overall file system controllers. ContentServers store and provide access to media.

A distributed file system employs file segmentation that creates “slices” (normally 8MB) that are stored in multiple locations across a pool of content servers. For high bandwidth requirements, 2TB capacities are optimum, whereas for high capacity environments, 12TB is used.

By using a modular design philosophy, a highly configurable and reliable system can be implemented. High bandwidth Ethernet switches are employed. There is no single point of failure. Increasing storage to 5TB can be done without a file system rebuild or partition.

A number of broadcast application developers are planning to support the Omneon MediaGrid. These include, Front Porch Digital, OmniBus Systems, Snell & Wlicox among others. Demonstrations at NAB included integration with Tektronix Cerify, Pro-bel Morpheus and eight other system vendors.

The rate of increase of the rate of change of technology is constantly accelerating. Broadcast engineering departments are hard pressed to keep up with new technologies. Grid-based systems are finding their way into the media infrastructure. Better to be ahead of the wave and get on the learning curve now.

IBM has a library of Redbooks on grid computing that provide a wealth of information. Two particularly good places to begin are “Fundamentals of Grid Computing” and “New to Grid computing”.