Support Questions

Find answers, ask questions, and share your expertise

Ext4 vs XFS Filesystem - Survey of Popularity

avatar
Expert Contributor

Wanted to see how many people have clusters where the HDFS DataNodes are running on XFS vs Ext4 filesystems? I'm trying to get a sense for which filesystem is chosen most often in the wild. Feel free to comment if you have a preference for one vs the other.

Thanks!

1 ACCEPTED SOLUTION

avatar

From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.

View solution in original post

13 REPLIES 13

avatar
Master Mentor

@Wes Floyd All ext4 ...None XFS

avatar

+1 .......

avatar

I've seen very little ext3 and mostly ext4 for the on prem deployments.

AWS EBS is xfs by default. XFS has its advantages but in a JBOD setup, it doesn't really provide lot of benefits.

avatar

From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.

avatar
Explorer

What are the recommended mount options for xfs?

avatar
Rising Star

FWIW, XFS is the default in RHEL 7, so I expect an uptick in new clusters.

avatar
Master Mentor

@Shane Kumpf @Wes Floyd

That's good to know https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Migration_Planning_Gui...

XFS is a very high performance, scalable file system and is routinely deployed in the most demanding applications. In Red Hat Enterprise Linux 7, XFS is the default file system and is supported on all architectures. Ext4, which does not scale to the same size as XFS, is fully supported on all architectures and will continue to see active development and support.

avatar

This should be updated / corrected then?

Partitioning Recommendations for Slave Nodes
Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using ext3, ext4, or XFS, in that order of preference. HDFS on ext3 has been publicly tested on the Yahoo cluster, which makes it the safest choice for the underlying file system. The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck.  

Source: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_partiti...

A lot of this conflicts with the reality (Paul's Smartsense statistics) and what we all are discussing here.

avatar
New Contributor

Also same goes with LVM. I am thinking that LVM (without snapshots), and ext4 or xfs is good ...