The challenges of storing video

The largest Fortune 1000 companies have grown their storage infrastructures to hundreds of terabytes, with the largest companies having multiple petabytes. In less than a year, Web 2.0 Internet start-ups and other companies storing rich media content have consumed more storage than the larger companies have accumulated over their lifetimes.

Social networking, application hosting, auctions, photo sharing and video distribution all consume more storage than older transactional type applications. The difference is that many Web 2.0 applications tend to be participatory and collaborative applications, where the users are the ones generating much of the content. The change from provider-generated content to user-generated content makes predicting storage growth far more challenging than in the past.

In addition to consuming vast amounts of storage, these new applications often have different I/O profiles compared to transactional applications, which tend to be primarily small-block, random I/O-oriented. Web 2.0 applications tend to store larger, more variable digital content. Access patterns can also vary between read-intensive (video distribution) to almost exclusively write-intensive (remotely hosted backup) and anything in between. The storage architectures that support high-performance transactional applications are not always a good fit for new video applications.

As for scalability, it's not that traditional monolithic arrays can't scale up to petabyte-plus configurations; it's the efficiency and cost (capital and operational) required to reach these levels. Not everything scales gracefully when 10-to-20-year-old storage architectures strain to accommodate the scalability requirements of these new applications. And, management paradigms that worked fine with tens of terabytes don't always scale well when managing multiple petabytes or more.

Video storage requirements

Like storage in general, there is no one-size-fits-all device that meets the requirements of every video application. However, video applications do share five common storage requirements: cost, performance, operational simplicity, modular growth and reliability. Let's look more closely at each.

Cost

Due to the large amounts of storage required by video, the cost per terabyte needs to be far lower than price points of traditional SAN and NAS storage devices used with transactional data applications. While everyone wants cheaper storage, many video applications would not be financially feasible without inexpensive storage. For most video applications, the cost of storage must be in the range of $1000 to $2000 per terabyte, with prices declining at 30 percent to 40 percent per year.

Although within the video storage market the term inexpensive is still relative, professional applications such as news and post-production editing, VOD, and TV station on-air playout may be able to afford more expensive storage than social networking and video-surveillance applications.

Just as the purchase cost can jeopardize the market viability of a video application, so too can excessive operational management costs. For instance, the cost to manage a terabyte of storage is often overlooked in calculating cost of ownership.

The four-year total cost of ownership of a video server can often be four to five times the initial purchase cost. It doesn't matter if the storage is being managed by video engineers or a station's IT department; it still needs to be as simple as possible to manage — or costs go up. While new storage technologies can simplify and automate the configuration and ongoing operation, the cost of ongoing management should be included in cost projections.

Performance

Video production and playout applications place different demands on storage than do other tasks. Traditional data-center applications consist of many random, small-block reads and writes. Video applications typically consist of large-block, sequential streaming I/O, with unique I/O profiles. For example, many video capture applications are 99 percent writes, while video sharing applications are 99 percent reads. Just because a server can support a large bank doesn't mean that same server can handle video well.

Storage has traditionally been optimized to meet the needs of transaction processing. Caching was used to help optimize the server's performance. However, video requires large block, sequential reads and writes, which can quickly overwhelm a cache, rendering traditional storage architecture ineffective for use with video.

Video applications benefit from architectures that distribute data across multiple disks and servers to increase the degree of parallelism, i.e., bandwidth. This approach not only provides a more cost-effective implementation, but also it allows video-optimized storage to meet the performance and throughput requirements that might not be available from more traditional storage architectures.

Operational simplicity

Historically, video storage was supplied as part of a media solution. The needed storage was embedded with the applications, such as editing or master control playout. Because these applications typically live outside the domain of IT management, stations often required additional support from their vendors. Due to the sheer volume of data, companies cannot afford the operational overhead of complex management paradigms.

Also, because video engineers, who were often less familiar with IT, were in charge of maintenance, the on-site management of the storage had to be far simpler than other facility storage. This has forced some stations to begin consolidating management of storage technology within their IT departments.

Modular growth

Video consumes large amounts of storage. And, the pace of growth is not always predictable. This is especially true of applications with user-generated content. Even traditional video applications like editing and graphics can experience unexpected growth due to expansion in the number of data sources, the move to HD or a desire to store content in multiple formats to support multiple playout channels.

One approach is to purchase a large monolithic storage array up-front, which will accommodate both planned and unplanned growth. The problem with this strategy is that monolithic arrays typically cost significantly more than the identical capacity in a modular array. In addition, this approach puts the acquisition cost up-front when the ROI or viability of the business plan may be unproven.

Modular storage provides a pay-as-you-grow storage model. In addition, modular arrays, with clustered or grid architectures, offer several advantages as a video storage platform when compared with traditional dual-controller modular arrays.

With traditional dual-controller arrays, additional capacity can be added up to a point. However, when bandwidth or controller processing power reaches its limit, the only solution is a forklift upgrade, or another modular array must be added. This requires each additional array to be managed as a separate entity.

Modular storage arrays with clustered architectures allow capacity, bandwidth and processing power to be scaled in modular units. Storage virtualization software eliminates the operational complexity of having to manage multiple independent arrays by creating a single virtual array that leverages the combined resources of the cluster to provide more linear performance and scalability. The array's virtualization software takes care of automatically distributing the data and processing across new modules as they are added. Clustered storage architectures provide pay-as-you-grow scalability while providing the simplicity of managing a single entity.

Reliability

Broadcasters require 100 percent up time, which demands protection from failure while allowing upgrades without disruption. In addition to hardware redundancy, RAID-type data protection is also a requirement. However, many video storage implementations are moving away from traditional RAID protection to other protection schemes that provide faster rebuild times in the event of a disk failure. Keep in mind that as capacities grow, the chance of a disk failure increases with the number of disks. This means the system must be able to rebuild quickly without impacting performance over long periods of time.

Video intelligence

Today's servers are highly intelligent devices with built-in diagnostics, volume management, snapshot, point-in-time copy and remote replication. In addition, some servers provide video-specific functions that optimize video storage and retrieval.

Other servers can automatically move frequently accessed data blocks and files to the outside of disks to improve disk seek times and performance. A server can even provide transcoding. This allows content to be converted to different output formats, helping create new revenue possibilities via new channels.

Direct attached storage

Some applications still store their data on inexpensive direct attached storage (DAS). This solution represents one of the most basic storage deployment options, where disk drives are either embedded in or directly attached to a single server. To access the content, workstations must be networked into the server. DAS storage still represents a large percentage of the video storage market, especially in price-sensitive applications like video editing.

This approach works well for applications requiring only small amounts of data. However, for most video tasks, DAS-based storage solutions can quickly become multiple islands of storage that are hard to scale and complex to manage. In addition, the initial cost savings of DAS can quickly be consumed by the operational costs or business impact that result from these limitations.

Traditional SAN and NAS

Traditional Fibre Channel SAN or NAS storage devices also can be used to store video data. If a company has a relatively small amount of content and a large investment in existing SAN or NAS technology, then these storage devices may provide a cost-effective approach for storage without having to retrain operational staff.

However, if the objective is to store large amounts of media-rich content with high-throughput requirements, then traditional SAN and NAS storage devices will not be cost-competitive compared to newer clustered storage architectures. In addition, traditional separated storage devices will likely run into performance, scalability and manageability limitations that will further exacerbate costs.

Clustered IP storage

IP-based clustered storage represents a cost-effective alternative that meets the unique requirements of video. Based on commodity servers and disk drives, clustered storage systems have many of the cost advantages of DAS. However, clustered storage can aggregate multiple storage nodes into a single logical system. Using this modular approach, performance, capacity and bandwidth can be scaled incrementally as needed while preserving the simplicity and manageability of administering a single virtual storage device.

IP-connected storage provides the bandwidth and performance required by media-rich applications without the additional cost overhead and operational complexity of deploying Fibre Channel SANs. Storage applications may use iSCSI (block-level), NAS (file-level) and object-level interfaces for the storage and retrieval of video content. Each of these interfaces has its advantages and disadvantages, and depending on the application, one may be more appropriate than another.

Overall, clustered storage architectures are well-suited for the demanding requirements of video applications. However, even within this approach solution, there is no one-size-fits-all solution. Most clustered IP storage solutions can meet the needs of generic video applications. So, while many broadcast-specific company names may come to mind, there is actually a variety of less-well-known vendors providing excellent video storage solutions.

Storage as a service

One option for companies looking to store video or other multimedia data is the storage-as-a-service (SaaS) delivery model. Vendors that deliver SaaS provide a simple Web-service-based API to store and retrieve multimedia objects in their hosted storage. Costs to store data fall between $1500 to $2500 per terabyte per year, with additional costs each time data is transferred in or out. While these rates may initially sound expensive, after factoring in the fully loaded costs to house, cool and manage in-house storage, SaaS storage services may be a financially viable alternative, depending on the data access requirements.

In addition, this option may be a useful and cost-effective disaster recovery/backup solution. Here the station faces only the storage and access costs, while not having to maintain a separate site, equipment and maintenance. This solution also solves the problem of keeping backup content well geographically separated from your on-air operations.

IP-based clustered storage is worthy of consideration as it provides an attractive solution to the handling of video and other media-rich content. Because of the sheer volume of video data being generated, traditional DAS, SAN and NAS solutions may no longer be the best economical choice. IP-based clustered storage is well poised to garner the lion's share of the video storage market.

Arun Taneja is the founder and consulting analyst with Taneja Group research and consulting firm.