Support Questions

Find answers, ask questions, and share your expertise

Ext4 vs XFS Filesystem - Survey of Popularity

Contributor

Wanted to see how many people have clusters where the HDFS DataNodes are running on XFS vs Ext4 filesystems? I'm trying to get a sense for which filesystem is chosen most often in the wild. Feel free to comment if you have a preference for one vs the other.

Thanks!

1 ACCEPTED SOLUTION

From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.

View solution in original post

13 REPLIES 13

@Wes Floyd All ext4 ...None XFS

+1 .......

I've seen very little ext3 and mostly ext4 for the on prem deployments.

AWS EBS is xfs by default. XFS has its advantages but in a JBOD setup, it doesn't really provide lot of benefits.

From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.

Explorer

What are the recommended mount options for xfs?

Contributor

FWIW, XFS is the default in RHEL 7, so I expect an uptick in new clusters.

@Shane Kumpf @Wes Floyd

That's good to know https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Migration_Planning_Gui...

XFS is a very high performance, scalable file system and is routinely deployed in the most demanding applications. In Red Hat Enterprise Linux 7, XFS is the default file system and is supported on all architectures. Ext4, which does not scale to the same size as XFS, is fully supported on all architectures and will continue to see active development and support.

This should be updated / corrected then?

Partitioning Recommendations for Slave Nodes
Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using ext3, ext4, or XFS, in that order of preference. HDFS on ext3 has been publicly tested on the Yahoo cluster, which makes it the safest choice for the underlying file system. The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck.  

Source: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_partiti...

A lot of this conflicts with the reality (Paul's Smartsense statistics) and what we all are discussing here.

New Contributor

Also same goes with LVM. I am thinking that LVM (without snapshots), and ext4 or xfs is good ...

Contributor

We use all XFS but, during some benchmarks on the jobs, we changed to EXT4 for better performance (1-3% depending on the workflow).

@Andrea D'Orio

Is it possible to publish benchmark numbers ?

Contributor
@Neeraj Sabharwal

I had to ask to our customer, but i think there'll be no probelm

New Contributor

Has anyone tried BTRFS?

We are testing with that now and have not seen any issues so far.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.