Video storage developments

The media and entertainment sector is facing a big challenge. The very storage media that have been relied on for decades are reaching end of life — film and videotape. The manufacturing of film cameras has ceased, and video cameras are transitioning quickly to other recording media, notably flash memory and optical discs.

It is not just acquisition that is seeing a change. The cost of wideband data networks is dropping, such that the distribution of programming can migrate from videotape to file transfer.

The industry has long used islands of data storage; the newsroom, editing and playout all are examples. But, the demise of film and videotape is breaking down the need for islands. Instead, file-based operations have lead information architects to call for new approaches to the way storage is used in the modern media factory.

One solution could be to aggregate all separate islands in central storage. The economies of scale and ease of maintenance would be advantages. Would it also be a single point of failure? Not with proper system design. But, to look at the problem from a business perspective, the media business has changed. The old linear model — make a program, air it and archive it for future repeat channels — has been disrupted by the new TV anywhere model. It makes more sense now to feed content into a repository where it can be constantly repurposed for long-tail viewing or delivery to new consumer devices.

The old model of online storage for editing and playout and archive for long-term storage has been replaced with always online. The storage hierarchy becomes high-performance SSD or disks online for editing and playout, nearline disks for general program storage, and archive for disaster recovery and digital preservation.

Media storage requirements

Conventional IT systems simply do not meet media storage's needs. Files can be large (one hour of DNxHD 220 video is 100GB), and files are only going to get larger as 3-D becomes more popular and Super HiVision is introduced.

Many program assets will be used decades from now, but generic IT storage is not designed for that timescale. Five years is more typical in an office environment, and the movie and entertainment sector is not unique in needing that much time. An example, the pharmaceutical sector may need to keep records for a patient's lifetime. The oil and gas sector will keep surveys for a field's life, which could be as long as 50 years.

Regular migration has not always been a feature of broadcast archives. With videotape lasting around 25 years, decisions to copy to a fresh format were put off for the next generation of staff to figure out.

This casual attitude cannot be adopted with data storage. Leave a file of a disk for 25 years, and the chances of recovering it are slim. The hardware and operating system will long be obsolete and unsupported.

It is expensive to digitize videotapes. Old recordings need special processing. Data storage is much simpler, as it is little more than an automated copy process.

Is cloud storage the answer? Cloud is more about the business model — storage as a service. It is about who owns and operates the storage rather than the storage technology per se. Cloud better matches the requirements of DR and archive.

NAS and SAN

Disk storage systems form into two groups, network attached storage (NAS) and storage area network (SAN). NAS operates at file level, presenting an NFS or CIFS filing system to the client. It is simple to administer, much like an office file server, but does not scale well. Each new NAS mounts as a new volume, leading to a disk space management problem. Where do users store projects? Which drive letter? What happens when a volume is full? Users may be allocated space on drives, but this is inefficient. What is needed is a central pool, so storage is shared.

The SAN provides the solution, where a separate metadata server directs clients to storage, but at block level to sectors on drives rather than at file level.

SANs have developed to handle clients running different operating systems — Windows, Mac OS, UNIX and Linux, for example — and present a single namespace to the client, regardless of how much storage is added to the SAN pool. However, SANs have traditionally been more complex to administer, and components like the metadata servers represent bottlenecks.

Many storage products have been developed to overcome the drawbacks of NAS and SAN but retain the advantages. A storage system should:

  • present a single namespace;
  • be fault tolerant;
  • be highly reliable and available;
  • support migration as components age;
  • be extensible, both capacity and I/O;
  • be secure against viruses; and
  • be easy to maintain.

And, it goes without saying that it should cost less than existing videotape systems.

RAID

RAID has been the usual way to deliver reliability and fault tolerance to a storage system. The concept of RAID dates back to 1987. The original paper from Patterson, Gibson and Katz describes disks with a capacity of 100MB, a fraction of the capacity of a throwaway thumb drive! Current drives of 2TB capacity are 20,000X larger, so it is no surprise that the RAID concept is showing its age.

Conventional RAID has shortcomings when used with modern drives. One is the rebuild times, which can stretch into days. Although disks have improved in capacity over the years, the data transfer rates have not kept pace. A typical 2TB, 7200rpm enterprise drive has a sustained transfer rate of about 140MB/s. Even at the maximum transfer rate, it will take four hours to read the entire drive. This exacerbates the long rebuild times in a RAID. During rebuild, further drive failures compromise the entire array. This rules out RAID-5. Instead, current systems must use RAID-6, which is tolerant to two drive failures, or full mirroring, in order to counter long rebuild times.

A further problem is the shear amount of program content lost if a drive does fail. It is equivalent to 14 hours of DNxHD 220 per terabyte. As drives head for 12TB projected capacity by 2014, the issues will only compound.

These issues are leading storage vendors to look for smarter systems beyond conventional RAID in order to better protect data, and to improve performance.

Tape or disk?

Disk is the usual storage medium for online storage, but tape now meets many of the demands for nearline storage. Modern drives and robots can make files available within a matter of minutes,

The latest tape formats (LTO-5 and 10,000C) support LTFS, which is a tape-based filing system. This allows tape to be mounted and used much like a disk drive.

Tape is best reserved for files that are accessed only occasionally. Setting the balance between tape and disk storage is not straightforward. Predicting the number of times a tape may be read in the future is an uncertain science. Will a program become an evergreen, or will it rapidly become dated? The business rules for managing the content will be a balance between technical considerations — media lifecycle, wear, reliability, access times — and business objectives such as running costs, availability and security.

Object storage devices

File-based storage does not inherently store information that is useful to a storage manager. Video files are unstructured data, as opposed to the structured data typically found in a relational database. A look at an MPEG video file reveals no information that would indicate whether it can be deleted or archived, or that it should be kept in nearline or online storage for immediate access. Such information as rights windows is kept in the right management database, and possibly the DAM system. This lack of important information at the file level leads to inefficient use of storage.

In contrast, object-based storage can encapsulate useful information with the media files as a media object. For example, information like keep forever, delete after a certain date when the rights expire, or not scheduled to play to air can be deleted from disk but maintained on the tape library. Currently, archive manager software and DAM implement these functions. For media needed for decades, encapsulating such information in an object makes more sense than with external applications, which may well not be around in the future. This moves more intelligence to the storage, which can interrogate objects to determine whether to retire or migrate aging data autonomously of any external applications.

Objects are stored in a flat address space and identified by a globally unique ID (GUID), as opposed to the ordered hierarchy of a filing system like CIFS or NFS. (See Figure 1.) As more storage moves to the cloud, the concept of GUIDs and embedded metadata to find and retrieve media objects will replace hierarchical filing systems. Consider how we find information on the Web using a search engine and keywords.

Several storage vendors are now offering a range of platforms based on object storage utilizing a number of different techniques to provide the increased performance now demanded of nearline storage.

Forward with object

Already, object-based storage systems are available for media applications that use such principles. The use of an object, and its GUID, enables storage to move away from the file- and block-based NAS and SAN constructs to new storage architectures that provide high levels of reliability, scalability and I/O transfer rates.

Object-based storage is best suited to reference or archive data that does not change, like a final cut or a program. The object approach has long been part of standards for media containers. AAF and MXF take an object-based approach to the wrapping of media files, as recent work to standardize containers for film and television masters continues. The Entertainment Technology Center (ETC) has developed the Interoperable Master Format (IMF) for the automated creation of multiple version of a movie for all the many delivery formats required today. Similarly, AMWA's AS-02 Multi-Version MXF containers wrap all of the elements required to create multiple versions of programs with different edits and different language support.

Object-based storage is not necessarily the answer to all of the storage needs for a broadcaster, but may well meet the needs of archive data that rarely changes. For the collaborative, rapid-turnaround editing of the newsroom, the SAN still fits the bill, and for the creation of graphics assets, with the manipulation of thousands of elemental files, the NAS is a good fit.