In the evolution from tape- to file-based workflows, asynchronous IP-based storage is increasingly chosen for online and nearline archive storage.
While in many cases, the mainstay of this activity has been RAID-based NAS or SAN solutions, grid storage has made inroads over the last year or so. Offering large storage capacities and simplified system management, grid storage is an alternative approach to the idea of bulk data storage, but it also offers another possibility: active storage. This article will examine the concept of active storage — what it is, how it works and the advantages that it can bring to the entire workflow.
The fundamentals of grid storage
In a nutshell, grid storage is comprised of separate, standalone content servers that are each responsible for storing only part (usually referred to as a slice) of each file loaded onto the system. In this way, the file itself is scattered onto multiple autonomous content servers. Separate metadata servers decide which slice goes to which content server. (See Figure 1.) The metadata servers provide the file system namespace to the various clients in the system.
This arrangement is analogous to the operation of a standard hard drive. The content servers are similar to the sectors of a hard drive, and the metadata servers are like the file allocation table of the drive, where a file name is translated into the addresses of the sectors of the disk where the data can be found. The idea has simply been expanded in the case of grid storage.
This architecture allows clients, whether reading or writing, to first ask the metadata servers for the locations of the slices and then interact directly with each content server to gain access to an individual slice. This is significantly faster than the traditional NAS approach, where all access to storage must pass through the NAS head — an obvious bandwidth bottleneck.
Another unique attribute of grid storage is its ability to provide data protection. Data protection is achieved by making copies of the slices onto other content servers in the grid. At any point in time, there exist at least two copies of all of the slices of each file. The principle is that the failure of any individual content server does not render the data unrecoverable because there's always at least one other copy of each slice available somewhere else on the grid.
The content servers operate autonomously, so re-replication of missing data can happen simultaneously through a number of content servers operating in parallel. An important item to note is that grid storage systems rebuild data, whereas RAID systems rebuild drives. The latter includes rebuilding sectors of the replacement drive that never held valid data in the original, which is clearly an invalid operation. This prolongs the rebuild time and extends the window of vulnerability for another drive failure.
Re-replication of data in a grid storage system happens significantly faster than rebuilding of a hard drive via RAID engine, massively reducing the window of vulnerability. If the replication factor is set to three or higher, the failure of any drive or content server will not leave the system in a vulnerable state because even if one copy of the file is completely lost, the data is safe, as there are at least two other copies of the affected slices somewhere on the grid. This offers even greater user-selectable data resiliency capabilities.
Replication has other advantages too. For example, the average latency encountered by each individual client decreases as the replication factor increases, which is extremely important in today's production environment.
The concept of active storage
Until recently, storage systems have been passive members of the workflow. Once media was stored on them, it remained there until external systems read the data, manipulated it and then put the result back onto the storage. This issue was true when media was stored on tape and has remained true in most cases when using disk-based storage.
Grid storage offers a new opportunity. As previously mentioned, grid storage is made up of separate content servers, each of which has a CPU, RAM and all of the other hardware that make up a modern platform. It is entirely possible for a powerful content server platform to take on additional processing tasks.
For example, each CPU can examine the slices located on its hard drives and perform automatic error checking, calculating a cyclic redundancy check (CRC) from the data. It then compares the CRC to a CRC that was calculated for the slice at the time it was created and was stored along with that data as part of the write process. If the two numbers don't match, the content server can declare its slice to be invalid, and the metadata servers can respond by causing the slice to be re-replicated from a known good copy of the slice to some other storage location within the grid. This effectively makes the system self-healing, with an associated reduction in the need for manual intervention by maintenance staff.
Taking this idea a step further, it is equally possible to use some of the processing power of the content servers to manage and process media. If the storage is aware that the data it is holding are actually media files, it is possible to use some of the CPU power of the individual content servers to perform media-specific processing tasks in addition to the activity of storing and serving up data.
It is, of course, vital that such use does not impinge on the ability of the content servers to provide data services to the various clients connected to the grid, which is its primary purpose. To this end, it is necessary to add a management layer to the system's code to ensure that no content server becomes oversubscribed. The remaining CPU power can be used as raw processing capability, acting on the data stored on the grid, or even being given external data sets, along with instructions on how to manipulate the data by some external application server. Typically, the components of such a configuration include:
- application controllers, on which the client application GUIs can run, which manage the operation of their individual applications;
- grid resource management software, which can receive requests for CPU cycles from the application controllers and in response allocate available CPUs to each requestor; and
- a grid application loader, which runs on each content server to set up the processing environment on that server and physically launch a process.
Suddenly, the system ceases to be a mere storage repository and becomes an active part of the user's workflow. It is easy to see how adding this capability can improve the business of processing material as it passes through the workflow. And such active workflows, by the nature of their parallelism, can operate substantially faster than their passive counterparts. Figure 2 is an example of the processes needed to manage grid storage in this way.
There are several activities that immediately come to mind when considering the possibilities enabled by active storage.
Integrated transcoding of material
It is extremely common for a facility to have media in different file formats at various stages of the workflow. The house ingest format, for example, is hardly ever used to archive material. For this reason, broadcasters are forced to transcode material as it makes its way through the production process, resulting in significant numbers of network transfers as the material is sent to the transcoding engine and the result is transferred back to the central storage.
Imagine a world where the central storage system automatically transcodes media based on business rules configured by the user and delivers it to the next stage of the process once a business metric, such as an editorial approval, has been met. The transcoding process lends itself to grid-based processing and is accelerated in direct proportion to the number of CPUs that are targeted at the transcoding.
Transoding technology also automatically generates proxies, based on user-supplied business rules. For example, high-res material is proxied to the house proxy format as soon as it is placed in the requisite location on the central grid storage system.
Content providers often need to provide a finished product in a variety of formats for the myriad of delivery and display systems. Once again, having this format conversion happen in the grid storage prior to handoff to the delivery subsystem is a great benefit.
A secondary benefit in the transcoding case comes from the inherent distributed nature of grid processing. At the moment, several transcoding manufacturers offer the ability to have multiple transcode engines available for use in a transcode farm. An individual job is given to an individual transcode engine. Therefore, if you have five engines free at any point in time, only one will be given a particular job. In the case of grid processing, however, all five could be pressed into duty to transcode the source clip, resulting in significant reduction of transcode time compared with the current method. (See Figure 3.)
Moving forward, it may also be possible to section up a transcode job so it is given to the content servers that physically contain the source slices. In this case, the individual content servers would only transcode the slices they contain, resulting in a reduction in network bandwidth, as there would be no need to transport the source slices to a specific content server for translation. These potential savings offer a real improvement in the transcoding workflow.
File verification and technical QC
Broadcasters can expend significant effort in technical QC of material after ingest and in file verification after transfer from one location to another. Such a labor-intensive activity can effectively be performed via software applications, which can check files for such things as GOP errors, macroblocking, audio levels and gamut. These applications can even detect and flag long periods of silence in the middle of an individual clip, signifying a potential error. These activities, which once required intervention by an operator, can easily be integrated into a grid storage and processing platform.
There is no proposal that QC for content and censorship can be performed within the central storage, though one can envision that some of this can be automated. Human judgment still needs to be applied to these activities.
Having central storage actively participate in operations ultimately leads to active workflows in which material is moved from storage location to storage location throughout the entire facility based on the rules engine detailed above. Material can be recorded into an ingest server and automatically moved, while it is still being stored, to the central server. Next it can be instantly checked for technical quality. Then, via a proxy automatically generated after the full-res material passed technical QC, it is passed on to the QC operator responsible for artistic content. After this stage, the material can be automatically rewrapped for editing and packaging. And after approval, it is automatically transcoded into the needed delivery formats.
While some of the above is forward looking, there are transcoding and technical QC products that can already hand jobs off to an active storage device for processing. Certainly, grid storage systems exist with the CPU horsepower to operate as active storage devices.
With the advent of grid-based storage systems, the idea of storage as an active participant of the end-to-end workflow becomes possible. When storage ceases to be merely a passive agent in the workflow, several tangible improvements can be made that offer the potential to substantively improve the efficiency of operations.
Active storage and active workflows will undoubtedly become more important as further improvements in storage components and distributed software applications become available. This approach offers significant improvements in bottom line performance. When the storage performs many of the processing tasks on stored media, there are savings in time, network bandwidth and cost.
The great news is that the foundations for this change are all already available from manufacturers, and integration of these functions into central storage systems is already underway. Storage is about to get much more interesting!
Paul Turner is vice president of marketing for Omneon.