Storage Trends—Exploring Objects and Scale-Out NAS

Network attached storage, or NAS, continues to expand in acceptance and in capabilities. Where, for years, the storage area network (SAN) seemed to prevail for high-end, performance-tailored storage, NAS is changing those dimensions.

Using NAS as a storage solution is practical when the content includes sources such as surveillance or smart devices. As content types and storage dimensions evolve, users are finding they need to understand not only traditional NAS, DAS or SAN solutions; but they also must be looking at some recent trends in storage that are not so wellknown.

For those who have supported conventional NAS, they may now be looking toward “scale-out NAS” as its natural alternative. And for those who may have already outgrown the efficiencies of scale-out NAS, the latest revitalization comes in the form of “object storage.”

This month we’ll look at both of the later storage technologies, touching on why they each have applications based upon factors such as file sizes, quantities, search-ability and metadata.

SCALE-OUT NAS
Personal computers typically store data that is based upon a file system; a technology that can easily store information and organize data on a structured basis and which is human readable. The concepts in scale-out NAS let this data organization operate as one large global namespace that typically keeps the data stored in a hierarchical structure.

Hierarchical storage is best explained by suggesting that the file sets are structurally located within a system consisting of subfolders that nest among parent folders and reside on a common hard disk, solid-state drive or similar storage platform.

As storage requirements grow, the scale of the file system must increase. Scale-out NAS provides a solution that allows for storage capacity and file system scalability. The NAS device is effectively like a gigantic “C: drive” with capabilities to store millions (to billions) of files.

Fig. 1: a scale-out NAS architecture integrates with the traditional, and simpler, LAN-based client/server/storage topology. Early scale-out NAS systems were built on the principle of nodes (Fig, 1). Multiple nodes can be appended to the NAS (i.e., “scaled out”) so as to improve overall storage (i.e., “scalability”); yet often at the sacrifice of small-file performance. As the node count increases, overall storage system performance begins to slow. Metadata searches can be significantly impacted as the size of the NAS and number of nodes is increased.

Flash technologies have changed the small-file performance by a technique referred to as flash-first, a metadata management scheme that improves internodal networking speeds.

Today, the ability to store billions of files without diminishing overall performance, especially for metadata searches, is the norm for scale-out NAS.

Real-time data consistency is another key factor in scale-out NAS. When dozens to thousands of users must access the NAS, the ability to lock the metadata—in real time—is critical to preventing accidental overwriting or corruption while accessing the same files.

It is not uncommon for enterprise users to access the same files for different purposes. Such functionality mandates a metadata- locking system, which allows those files to be shuffled and/or their metadata renamed depending upon what the user’s purpose might be.

Human file management is another key factor for scale-out NAS. A typical use case example of both hierarchical and human-readable metadata is in the practical structuring of photographic images, which are often renamed, shuffled or moved into other folders for organizational or identification purposes.

The original image is likely known by the file name given to it during the capture process, such as “DSCN1234.” This is a rather useless name to anything other than a database manager. Users often will copy that file and rename it, e.g., “Johns 18th Birthday Present,” placing it (and several others) into a folder called “Family Birthday Pictures,” which resides inside another folder called “Birthdays.”

Scale-out NAS allows these file-naming techniques to extend throughout the individual’s library or in the case of the human resources department in an enterprise, across many interdepartmental file shares or home directories.

Other important factors for scale-out NAS include security management (user accessibility with read/write permissions); file-sharing among varying applications across a large NAS without the risk of one application overwriting the file in use by another application; and the ability for IT departments to have a central process by which all storage is globally controlled under one management system.

However, scale-out NAS is not without its limitations. Users are finding that even modern scale-out NAS solutions cannot keep up with the accelerating demands for storage.

We’ve often believed that unstructured data was dominated by content that was video-centric in nature. That case certainly holds merit, but as the Internet of Things (IoT) becomes more prominent, machine-generated data is gradually overtaking the amount of data generated by humans. Such emerging “web-scale” growth requirements are driving changes as to how files and file systems must be managed. Object storage is one of those solutions that simplifies the manageability equation.

Fig. 2: The familiar “rigid” hierarchical folder structure vs. the easier “everything in one bucket” approach for object storage that depends on containers instead of folders.OBJECT STORAGE
As a relatively recent concept, object storage is being applied to many storage provider solution sets aimed at addressing the limitations of scale-out NAS or other storage solutions.

Object storage mitigates the more complex hierarchical metadata attributes controlled through the legacy Portable Operating System Interface (POSIX) standard (Fig. 2).

Because object storage has only a few commands (e.g., Get, Put, Delete) it is an extremely simple-to-use interface. Its simple set of commands allows objects to exist with (globally) unique identifiers that are managed in as single “flat” address space.

The underlying principle in object storage is that data can be retrieved without having to know where or how that data is stored.

Object storage is governed in part by an extended metadata set, which is much deeper than that found in the conventional file system management tool set. Objects essentially become “self-describing” (the object knows what the information is about or what it is for). Objects can contain specific details about the application it serves, without limits as to what the content is, the size of the metadata contained in the object or how that information must be interpreted.

Objects therefore contain an extremely rich set of information, which users and programmers can leverage to allow their applications (and the storage that supports those apps) to perform much better. It includes attributes that allow for global distribution and infinite scalability. Its self-healing design yields high data reliability and provides bulk storage to be obtained at a much lower price point.

Users at both the enterprise level and the individual level are finding the feature sets in both scale-out NAS and object storage can overlap each other and, for many, yield new and important functionality heretofore unachievable with a single traditional storage solution approach. Object storage devices are now beginning to take advantage of NAS-like features as well, such as the expanding of interface-access methods across multiple applications, including both file and block storage.

No doubt, this will change how people buy on-premises storage and is, in part, how the cloud organizes its enormous amount of data on a much broader, global basis. Expect to see a lot of growth in these two storage technologies in the coming years.

Karl Paulsen is a SMPTE Fellow and chief technology officer at Diversified. For more about this and other storage and media topics, read his book “Moving Media Storage Technologies.” Contact Karl atkpaulsen@diversifiedus.com.

Karl Paulsen

Karl Paulsen is the CTO for Diversified, the global leader in media-related technologies, innovations and systems integration. Karl provides subject matter expertise and innovative visionary futures related to advanced networking and IP-technologies, workflow design and assessment, media asset management, and storage technologies. Karl is a SMPTE Life Fellow, a SBE Life Member & Certified Professional Broadcast Engineer, and the author of hundreds of articles focused on industry advances in cloud, storage, workflow, and media technologies. For over 25-years he has continually featured topics in TV Tech magazine—penning the magazine’s Storage and Media Technologies and its Cloudspotter’s Journal columns.