Scale-Out ZFS!

Robert Murphy
News

I have been lucky to work with some of the smartest people on some of the coolest projects in IT. Perhaps no more than the Fishworks team at Sun Microsystems, who developed “Amber Road”. Eventually simply and brilliantly named the “ZFS Storage Appliance” by no other than Larry Ellison. But the Fishworks team were the real Rockstars. So much so they were the only engineers I ever knew where visiting customers constantly asked for their autographs!

Why is ZFS so Extraordinary?

ZFS is a powerful and flexible file system that is well-suited to handling large amounts of data and providing data protection and management capabilities that are often difficult to find in other file systems. It is often considered the absolute best file system available today for several reasons:

Data Integrity: ZFS is designed with data integrity as a top priority. It uses checksums to ensure that data is not corrupted or lost, and it also includes features like copy-on-write and RAID-like data redundancy to protect against data loss.

Adaptive Replacement Cache (ARC): ZFS uses a sophisticated caching system called ARC. It is designed to cache frequently accessed data and metadata in memory, which helps to reduce disk I/O operations and improve performance. (My favorite feature).

Snapshots and Clones: ZFS includes powerful snapshot and cloning capabilities, allowing users to create point-in-time copies of their data quickly and easily. This is useful for backup, testing, and other purposes.

Compression and Deduplication: ZFS includes built-in compression and deduplication capabilities, which can help save space and improve performance by reducing the amount of redundant data stored on the system.

RAIDZ: ZFS offers its own implementation of RAID called RAIDZ. RAIDZ uses a COW mechanism that provides data protection while also improving performance. It is designed to work with large disks and can provide better performance than traditional RAID implementations.

Open Source: ZFS is now an open-source project, which means that anyone can use and contribute to the development of the software. This has led to a vibrant community of users and developers who are constantly working to improve the system.

But ZFS has a Huge Handicap–It can’t Scale-Out

At the time of Oracle’s purchase of Sun, the next new, new thing on the ZFS Storage Appliance Roadmap was Scale-Out NAS capability. It never happened, so ZFS’s biggest limitation is that it is “scale-up” only. As good as ZFS is (and it is very good), today’s ever-growing storage requirements have relegated it to the smaller end of the storage capacity spectrum, with Scale-Out NAS becoming the Darwinian dominant storage species for large file storage environments.

Enter Mediaflux and Scale-Out ZFS

Since leaving Sun, I often encounter other fans who have successfully deployed ZFS. The most recent is a world-renowned Cancer Center.

The Center was storing their research data from over 200 labs – consisting of large amounts of genomics, CryoEM image, and scientific data – totaling 6 PB and more than 2 billion files on over 30 ZFS Network Attached Storage (NAS) servers. This data supplies the Center’s various analyses and AI pipelines. When a server reached capacity, research data (and the researcher) had to be moved to another server with available space in a “Tetris-like” manner – a painful process for both IT and researchers that became untenable.

The Center was looking for a way to eliminate managing the different logins and capacities on the ever-growing number of servers. Moving to traditional enterprise Scale-Out NAS was determined too expensive and didn’t give them the flexibility required.

The Center arrived at front-ending the ZFS storage servers with Mediaflux, providing a scalable load-balanced global namespace and single mount point for researchers and instruments with easy management for IT. Essentially a “poor-mans” enterprise scale-out NAS system (in price, but not in features). As shown in the figure below, Mediaflux also replicates the Cancer Center’s data to off-site storage and automatically archives data to low-cost AWS S3 Deep-Archive.


The Cancer Centre’s ZFS Mediaflux Scale-Out NAS System
The Cancer Centre’s ZFS Mediaflux Scale-Out NAS System

Mediaflux makes it easy and straightforward to access research data, regardless of where it is stored. All the ZFS storage servers can be managed as a single entity, simplifying the storage environment's administration. And if one copy of the data is lost, corrupted, or unavailable, remote mirrored copies can be used, reducing downtime, and keeping research going. And the center uses Mediaflux to automatically archive colder data to low-cost AWS S3 deep archive storage. And all this (and much more) comes with the base Mediaflux license (no extra cost add-ons) which is also not capacity based (hallelujah)!

As you can see, the combination of Mediaflux and ZFS is a powerful alternative to expensive enterprise scale-out NAS. And I’m not just talking about economics. Arcitecta developed its own NFS, SMB, and S3 protocols in-house. This costly and time-consuming effort resulted in exceptional performance, scale, security, and efficiency – enabling data to be processed quickly by any application. And no enterprise scale-out NAS system can replicate and tier data to other storage systems as well as Mediaflux can, no matter how much you pay.

In conclusion, while ZFS is a powerful file system with many exceptional features, its inability to scale out has been a significant limitation. However, with the combination of Mediaflux and ZFS, there is a powerful alternative to expensive enterprise scale-out NAS. The example of the world-renowned Cancer Center demonstrates how Mediaflux provides a scalable load-balanced global namespace for researchers and instruments with easy management for IT. This innovative solution makes it easy to access research data regardless of where it is stored. With Mediaflux and ZFS, organizations can have an affordable, powerful, and scalable storage system that can easily handle large amounts of data and protect it from loss. For more information visit arcitecta.com/solutionbrief/scale-out-zfs


Bob Murphy is VP of Marketing at Arcitecta. Over his career, Bob has worked for tech titans such as IBM, Silicon Graphics, and Hewlett-Packard. His knowledge of modern computing environments is immense, and comes from a space of expertise.