As more broadcasters transition to file-based media production systems, Internet Protocol (IP) has become the transport technology of choice. Applying an IT-based infrastructure to media introduces benefits to media workflow systems, but also creates new requirements that must be addressed. One area that demands particular attention is storage, as media applications pose unique storage requirements that differ from classical IT solutions.
An IP-based media storage architecture must provide robust parallel throughput and high capacity, scalability, redundancy and availability. The nature of a file-based media workflow means it must operate in a multiuser environment and provide simultaneous access to files (leading, by definition, to a random access pattern on the underlying file system and storage architecture). Because of the continuous nature of media applications, the throughput of each I/O operation on media storage must also be guaranteed under all circumstances. While a slow e-mail is still an e-mail, a slow video is no longer video.
To meet these requirements, each technology layer of a media storage environment must perform optimally and provide guaranteed throughput, including the storage network. Today, many file-based media storage environments use InfiniBand (IB) storage network interfaces due to its high link bandwidth (16Gb/s throughput for double data rate IB) and low cost per port. Now, a new technology, Data Center Bridging, has emerged that can offer performance advantages over IB. Our lab tested DCB against IB in a typical media network and found that DCB provides a compelling media storage solution.
Creating a lossless network
A stringent requirement for any storage network technology is to provide a lossless environment. IB and DCB accomplish this in different ways. IB uses a buffer-to-buffer credit mechanism to avoid frame loss. Credits of available buffers are continuously exchanged between ports on the same link. When no buffer credits are available, no packets are transmitted until the network processes its congestion and buffers become available again. Hence, the receiver never needs to drop frames.
DCB provides a lossless network through the use of a pause frame mechanism defined in the IEEE 802.3x standard. Conventional Ethernet does not keep track of the buffer availability on the receiving end of a link and assumes by default that buffers are available, creating a risk of buffer overflow on the receiver. With the pause mechanism, a receiver that notices that its buffers are being filled above a certain threshold can send a pause frame back to the source, telling it to hold all transmissions for a certain amount of time. (See Figure 1.) Once the buffers have returned below the threshold, the source can resume transmissions.
In a media cluster environment, with its simple well-defined topology, the 802.3 pause mechanism should demonstrate an observable behavior close to the lossless buffer-to-buffer credit mechanism of IB.
Running the test
We based the test architecture on the General Parallel File System (GPFS) media storage cluster from IBM, one of the most powerful media file systems available. A GPFS cluster based on the network-attached node (NAN) model consists of storage cluster nodes and network-attached cluster nodes. The storage servers are directly connected to the storage, whether locally attached or via a storage-area network (SAN) architecture. NAN nodes are via a cluster network connected to all storage nodes, but are not directly attached to the underlying storage. In this architecture, each storage node is the access point or primary server for part of the total storage. The NAN node stripes its data requests over all storage nodes, thereby aggregating the available bandwidth of each individual storage node and connected storage subsystems.
IB-based WARP cluster
The IB cluster architecture, named Workhorse Application Raw Power (WARP) media storage cluster, displays a many-to-one traffic pattern. When a NAN node reads from the storage nodes, all the storage nodes respond at the same time back to the NAN node with large bursts of traffic. If, on the other hand, multiple NAN nodes write data to the storage, the receiving storage nodes are simultaneously addressed by the bursts of all the writing NAN nodes. Both cases result in heavy oversubscription of the cluster network.
The InfiniBand stack is extremely efficient for Linux-based servers, reaching the full physical limits of the underlying bus technology. The processing of the protocol stack is fully offloaded in the IB host channel adapter network cards. Even remote direct memory access is fully supported and exploited. This leads to a powerful cluster architecture, extremely well adapted for file-based media production environments. (See Figure 2.)
DCB-based WARP cluster
One potential issue of media environments that places constraints on IB-based architectures is that many media client applications require a Microsoft Windows operating system. This is the case for both Windows applications that have to run on the NAN cluster nodes and applications that require a mount of the central file system via the CIFS protocol. Recently, IBM added a GPFS-on-Windows client to its NAN node configuration. This allows a Microsoft Windows 2003 server to participate as a NAN node in the GPFS cluster. The state-of-the-art IB stack for Windows machines is, however, much lower-performing than the Linux version. The cluster protocol stack has to fall back to using IP-over-Infiniband (IPoIB) without any offloading, because not all GPFS commands are yet supported in the native IB stack for Windows. This decreases the performance of the cluster network by a factor of five.
Because GPFS is agnostic to the underlying network technology, the GPFS WARP cluster can be designed with DCB technology replacing IB as the cluster network. (See Figure 3.) This should be especially beneficial in the Microsoft Windows NAN node environment. For our test case, we created a DCB architecture using the Cisco Nexus 5000 Series switch, a widely deployed 10Gb/s Ethernet data center platform that supports 802.3 PAUSE frames. Our tests showed that a DCB-based GPFS WARP cluster is capable of making use of the full bandwidth provided by the 10Gb/s DCB platform.
We configured the IB-based system to use the “dd” application as stream generator and used DDR IB in an active/active setup using verbs. The storage throughput limitation for the system was ~800MB/s per storage system, and the cluster consisted of three NAN nodes and four storage nodes. As a baseline reference, we performed throughput benchmark tests on the IB-based clusters with both Linux and Windows NAN nodes. The tests included:
- single stream throughput from one NAN node;
- multiple streams from one NAN node (for a more even saturation of the link bandwidth); and
- multiple streams simultaneously from all NAN nodes.
Table 1 shows the results.
For the next phase, we replaced IB with 10Gb/s DCB in the storage cluster network. The IEEE 802.3x pause mechanism was activated, both on the Cisco Nexus 5000 switch and on the converged network adapters (CNA) to create a lossless Ethernet. The DCB-based WARP cluster tests were executed with Nehalem-based servers as NAN nodes, as they were able to fully saturate the 10Gb/s link.
In the first DCB setup, all servers were connected with a single 10Gb/s link to the Cisco Nexus 5000 switch, compared with the double active/active setup of IB. In a second test, dual links were used. The traffic between different server pairs was routed over different interfaces. We performed the same benchmark tests as were used for the IB-based system. The results for the single-connected and double-connected systems are shown in Tables 2 and 3.
These results show that the DCB system outperforms the IB cluster on these benchmarks. A single NAN DCB-based cluster outperforms the IB-based cluster by 663 percent, and three NAN DCB-based clusters outperform the IB-based cluster by 384 percent. The performance gain for the Windows setup is remarkable.
Ultimately, DCB is an effective solution in real-world media environments. The Windows performance of a DCB-based GPFS media storage WARP cluster opens up opportunities to connect many Windows-based editing stations to a generic media storage cluster. With this architecture, it is easy to envision a high-resolution HD post-production editing platform that scales well beyond the state-of-the-art solutions presently offered by media vendors.
Luc Andries is ICT-architect at VRT-medialab.