Created 12-02-2015 09:15 PM
Wanted to see how many people have clusters where the HDFS DataNodes are running on XFS vs Ext4 filesystems? I'm trying to get a sense for which filesystem is chosen most often in the wild. Feel free to comment if you have a preference for one vs the other.
Thanks!
Created 12-03-2015 10:42 PM
From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.
Created 12-02-2015 09:16 PM
@Wes Floyd All ext4 ...None XFS
Created 12-02-2015 09:37 PM
+1 .......
Created 12-02-2015 09:32 PM
I've seen very little ext3 and mostly ext4 for the on prem deployments.
AWS EBS is xfs by default. XFS has its advantages but in a JBOD setup, it doesn't really provide lot of benefits.
Created 12-03-2015 10:42 PM
From a SmartSense perspective it's about 80% ext4, and 20% XFS. We recommend using either and have specific mount options for each type of filesystem.
Created 03-30-2017 05:54 PM
What are the recommended mount options for xfs?
Created 12-03-2015 11:02 PM
FWIW, XFS is the default in RHEL 7, so I expect an uptick in new clusters.
Created 12-04-2015 12:53 AM
That's good to know https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Migration_Planning_Gui...
XFS is a very high performance, scalable file system and is routinely deployed in the most demanding applications. In Red Hat Enterprise Linux 7, XFS is the default file system and is supported on all architectures. Ext4, which does not scale to the same size as XFS, is fully supported on all architectures and will continue to see active development and support.
Created 12-04-2015 04:57 AM
This should be updated / corrected then?
Partitioning Recommendations for Slave Nodes
Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using ext3, ext4, or XFS, in that order of preference. HDFS on ext3 has been publicly tested on the Yahoo cluster, which makes it the safest choice for the underlying file system. The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck.
A lot of this conflicts with the reality (Paul's Smartsense statistics) and what we all are discussing here.
Created 03-04-2016 07:29 PM
Also same goes with LVM. I am thinking that LVM (without snapshots), and ext4 or xfs is good ...