Inside the File System

Media files now consume enormous amounts of storage, growing at a scale that has led to the termination of integrated drives in dedicated media server chassis. Future systems will need to be attached as network storage, storage area networks, clustered stores and grid-based stores. Consequently, storage structures are becoming the “need-to-know” terminologies of the media server world.

Regardless of the storage topology, the one common dimension to all storage is the file system, that main stream element that holds the key to all the marbles. File systems are elegant in their structure, with organizational parameters that are application-specific to both the storage architecture and the users’ needs for the media the store holds.

If you think of media storage in similarity with computer data storage, you’ll find some commonality, with the exception that media files are both huge and contiguous. One of the common points remains the directory file structure and how it is treed (or allocated) across the entire store. A directory hierarchy (or the directory tree) is what organizes file data on a computer and/or storage system. The set of directories and its organization is called the file system.

File systems start small and become quite large as data is added. They employ varying means for handling the dynamics of a constantly changing dimension. Storage architectures are engineered to reap the benefits available from their file system constructs. To understand what a file system is, and why they are chosen, we’ll look at two of the file system architectures employed by those clustered and networked storage systems employed in media content related applications. One is called a journaling file system, the other is the more familiar distributed file system.

JOURNALING FILE SYSTEM

When a file system logs its changes to a journal it is referred to as a “journaling file system.” JFSes are prevalent in Unix-like operating systems, including Linux. This journal may be a circular log located in a dedicated area of a storage system, often found as hundreds of megabytes of battery-backed up nonvolatile RAM (NVRAM).

A JFS logs all proposed changes to the file system prior to committing that change to the main file system. This reduces the likelihood that the entire file system becomes corrupted should a power failure or system crash happen during one of the file management steps, such as when the file is moved from host into the main memory. For further protection, file systems are backed up on separate drive media, interleaved with main storage data, across nodes or a combination of all three.

There is a good rationale for utilizing a journaled file system. Should program crashes or power system glitches occur during routine storage activities, files can be either left orphaned or an unintentional memory consumption could occur because the program believes that portion of the memory is still in use. For example, the deletion of any file requires that individual and isolated steps be performed. The first process involves removing the file from its directory entry, and the second is marking that space, and its inode, as free (available) in the free space table.

As a file is created, the file is assigned both a file name and an inode number, a unique integer within the file system. An inode is a data structure, i.e., an efficient means for locating stored data. On Linux and other Unix-like operating systems, this inode stores all the information about a file except its name and its actual data. Data structures occur in different forms, are tailored to the application type, and may be highly specialized for specific types of tasks. File names and their corresponding inode numbers are stored as entries in the directory—thus a directory associates file names with inodes.

Should a crash happen immediately after the file directory entry occurs, an orphaned inode is generated and in turn a storage leak is triggered. Sometimes called a memory leak, storage leaks occur when a program or operation fails to release a memory segment that is no longer needed. This results in unintentional memory consumption, reduces system efficiency, and slows the system down.

The alternative problem occurs when marking of the space happens just before a crash. Here the not-yet-deleted file is unintentionally marked as “free,” and may later be overwritten by another activity. Journaled file systems maintain an orderly record of all changes that would be made in advance of the actual operation. Should a crash occur, the system recovers by replaying the changes from the journal until the file system stabilizes.

One counter against journaled file systems is its requirement to double-write all the data, which can seriously reduce performance. A solution employed in some systems is metadata-only journaling—whereby only changes to file metadata are relegated to the journal. Should a failure occur, the file system can still recover the next time it is mounted; however, there is still an opportunity for file data that was not journaled to fall out of sync with the journaled metadata.

DISTRIBUTED FILE SYSTEM

Another file system commonly used is the distributed file system. Generally employed on a network file system (such as one that supports the conventions of file sharing or other resources), a DFS is a single file system spread across several physical computer nodes. This networked DFS appears to the user as a centralized file system regardless of where or how it is stored.

A computer node is like a logical placeholder for data, a memory block containing some data unit and perhaps references to other nodes, which in turn contain data and more references to yet more nodes. Nodes cross reference each other so there remains an element of protection should any particular node or set of nodes go offline. Some entire video server systems are built around this node concept as a scheme for replication and redundancy that is transparent to the user.

There are different variations (and protocols) for network file systems. Examples include: the Network File System originally from Sun Microsystems, the standard in Unix-based networks; and Server Message Block originally from IBM with its most common version, a heavily modified version by Microsoft which is the standard in Windows-based networks. SMB is better known as the Common Internet File System or Samba file system. The Apple Filing Protocol is used in Mac OS.

A DFS can be implemented either as a standalone root-distributed file system or as a domain-distributed file system. Users get easy access to files because they need only go to one location on the network to access files, regardless of whether those files are physically located across multiple servers. Furthermore, users no longer need to use multiple drive mappings to access their files.

DFSes generally employ facilities that allow for the transparent replication of files and fault tolerance protection. Through the use of separate nodes, users may have direct access to only a part of the entire file system. This contrasts with another architecture, the shared disk file systems, found in the File Allocation Table system used on DOS and Microsoft Windows, where all nodes have uniform direct access to the entire storage.

Server storage systems may employ one or more of these architectures in implementing their file system strategy. It is important to ask how the file system is employed—such as:

  • Where are the single points of failure?
  • What is the scalability and is it disruptive (i.e., do you lose time, must you restripe) as you expand?
  • What is the time required to mount additional storage?
  • How the system is balanced—either automatically, in the background or not at all?

Video server deployments will increase, influenced in part by the content, the type and the size of the files they manage. The growth momentum perspective is an equally important factor in selecting a storage system—with storage scalability being a variable that is always moving upward.

Karl Paulsen recently retired as a CTO and has regularly contributed to TV Tech on topics related to media, networking, workflow, cloud and systemization for the media and entertainment industry. He is a SMPTE Fellow with more than 50 years of engineering and managerial experience in commercial TV and radio broadcasting. For over 25-years he has continually featured topics in TV Tech magazine—penning the magazine’s Storage and Media Technologies and its Cloudspotter’s Journal columns.