Breaking Down the File Systems, Pt. II

Karl Paulsen

Last month we began discussing typical file systems, starting first with local and shared file systems, then moving into the network file system. The network file system (also known as a proxy file system) allows for the sharing of content in directories, drives or entire storage systems that extend across a network.

This month we resume the discussion with two more network file system subtypes: the distributed file system and the distributed parallel file system.

A distributed file system (DFS) is a single file system spread across several physical computer nodes, appearing to the user as a centralized file system regardless of where or how the data is stored.

The computer node is like a logical placeholder for data, a memory block containing some data units and perhaps references to other nodes, which in turn contain data and more references to yet more nodes. Nodes cross reference each other so there remains an element of protection should any particular node or set of nodes go offline. Some entire video server systems are built around this node concept as a scheme for replication and redundancy that is transparent to the user.

Fig. 1: The distributed file system where files are relegated across file system servers using network protocols as a single file system.

The parallel equivalent of the distributed file system is described later in this article.

EACH ACCESS
Users of network file systems get easy access to files because they need only go to one location on the network for access, regardless of whether those files are physically located across multiple servers or at geographically separated locations. Furthermore, users no longer need to use multiple drive mappings to access their files.

Familiar types of file systems for storage applications include those for local magnetic disk (e.g., FAT), optical disc (e.g., UDF) and linear tape (e.g., LTFS for LTO). File systems can also be proprietary, device specific or embedded, such as in a Flash device (e.g. RFS—the Robust File System from Samsung). Operating systems may also support more than a single file system, and other OSs may build a virtual file system—one file system that includes a single root directory with all files located in a single hierarchy under the root. Network file systems, those that act as a client for remote file access, are connected across a LAN or WAN. The network file system allows for a global consolidation of files which eases administration, compliance and management of both clients and data.

FILE-SHARING SYSTEMS
Like most computer technologies, there are many variations in the applications of and protocols for network file-sharing systems, the details of which are beyond this particular article.

Of the myriad alphabet soup of file systems in use, the more familiar protocols include: the Network File System (NFS) originally from Sun Microsystems, which is the standard in Unix-based networks; Server Message Block (SMB), originally from IBM with its most common version; and a heavily modified version by Microsoft, which is used in Windows-based networks. The set of message packets that defines a particular version of a protocol is called a dialect.

Fig. 2: The distributed parallel file system where file segments are relegated across storage nodes using parallel I/Os.

The Common Internet File System (CIFS) protocol is a dialect of SMB; that is, SMB is an enhanced version of CIFS (CIFS was significantly changed in v8.0 in July 2010). The meaning of the term “CIFS” has changed since it was first introduced. CIFS was originally used to indicate a proposed standard version of SMB based on the design of the Windows NT 4.0 operating system and Windows 2000 operating system implementations. In some references, CIFS has been used as a name for the SMB protocol in general (all dialects) and, additionally, the suite of protocols that support and include SMB.

Samba is a free software-licensed re-implementation of the SMB/CIFS network file systems protocol for Linux and Unix.

Apple employs its own proprietary Apple Filing Protocol (AFP)—originally developed in the late 1980s as AppleTalk— which is used in the Mac OS and supports NFS, SMB and FTP. For OS X v10.7 Lion (June 2013), Apple wrote its own implementation for Windows File Sharing called SMBX, intended to replace Samba while simultaneously adding support for Microsoft’s SMB2.

DISTRIBUTION ACROSS FILE SERVERS
Network file systems whose files are dispersed across file servers and connect to the file system client application through a particular network protocol are distributed file systems (Fig. 1). In this model, the principle application and file system client is linked through a fabric or switch to various file-system servers that comprise the single file system. Although there are usually multiple file-system servers and associated storage volumes there is only a single file system associated to all the file system servers and stores.

On the other hand, a distributed parallel file system (Fig. 2) spreads file segments across storage nodes, which allows for parallel input-output to individual files, also called “striping.”

Both forms of file systems may employ built-in fault tolerance whereby additional resiliency is added through the inclusion of integral check-summing, mirroring or parity on one or more of the block devices. Other fault-tolerance techniques employed included cloning whereby a “copy-on-write” command essentially takes a snap shot as a sub-volume that can then be shared with another sub-volume. Should an error occur, a snap shot allows a rollback to a previous alternative sub-volume, which would (normally) not be corrupted. Snap shots are often options from storage vendors or file system vendors which, when applicable, are now being implemented in Flash or other solidstate/ non-volatile memory solutions.

STORAGE WITHOUT A FILE SYSTEM
Object storage—now used primarily in cloud storage and archives—is gaining momentum, especially in distributed storage architectures found in geographically dispersed storage systems and data centers. Object storage allows the client access to huge amounts of data without the requirements for a file system.

The command sets for an object store are simple and can be described basically as “get,” “put” and “delete.” An object storage platform can be built out at less cost than building out tier-1 storage or employing traditional RAID-based storage systems.

Finally, file systems and storage system architectures are seeing significant changes as the introduction of Flash-NAND (solid state storage) and Non-Volatile Memory (NVM) continue to gain momentum, affecting not only storage subsystems, but entire computer architectures.

ACKNOWLEDGEMENTS
Credit for information and concepts contained in this article is given to select leaders in the storage and file-systems industries—in particular the Storage Network Industries Association (SNIA), Samba.org, Apple and Microsoft; and to the recent Storage Visions 2014 conference held in Las Vegas last month. Readers should expect updates to terminology and references which may be frequent and sometimes complex.

Karl Paulsen, CPBE, is a SMPTE Fellow and chief technology officer at Diversified Systems. Read more about other storage topics in his current book “Moving Media Storage Technologies.” You can contact him atkpaulsen@divsystems.com.