Created 06-01-2016 10:42 AM
Following is the prod. cluster planned infra.
Initially 4 data/compute nodes each with 2x12 cores, 256 GB RAM and 24x2TB disks (plus 2x300 Gb for Linux).3 name/admin nodes (with much less disks configured as RAID 1). Later, 4-5 datanodes will be added. All nodes will be having RHEL 7.
We will be proceeding with the latest 2.4 HDP installation via Ambari.
The HDP documentation has following statements :
The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck
I read several existing threads and doc. but I still don't have a clear understanding of what suits in the latest editions of HDP and RHEL.
ext4-vs-xfs-filesystem-survey-of-popularity
best-practices-linux-file-systems-for-hdfs
any-recommendation-on-how-to-partition-disk-space-1 @Benjamin Leonhardi insightful recommendation
Following are the possibilities :
Any suggestions/recommendations/further reading(suited to the latest HDP2.4 and RHEL 7 environment) ?
Created 06-01-2016 12:10 PM
The HDP documentation around filesystem selection is out dated. ext4 and XFS are fine choices today.
You can use LVM for the OS filesystems. This provides a nice way to shuffle space around on your 2x 300GB OS drives as needed. XFS is perfectly fine here, so you can let RHEL use the default. However, note that XFS filesystems can not be shrunk, whereas with LVM + ext4, filesystems can be expanded and shrunk while online. This is a big gap for XFS.
For the datanode disks, do not use RAID or LVM. You want each individual disk mounted as a separate filesystem. You then provide HDFS with a comma separated list of mount points, and HDFS will handle spreading data and load across the disks. If you have 24 data disks per node, you should have 24 filesystems configured in HDFS. XFS is good choice here, since resizing is unlikely to come into play.
Also keep in mind that /var/log and /usr have specific needs. /var/log can grow to hundreds of GBs, so moving this logging to one of the data disks may be necessary. The HDP binaries are installed to /usr/hdp, and depending on which components you are installing, could use as much as 6GB per HDP release. Keep this in mind as sufficient space is needed here for upgrades.
Hope that helps.
Created 06-01-2016 12:03 PM
You can use whatever filesystem you like for the O/S filesystems etc, our recommendations are primarily targeted at the HDFS data drives.
I wouldn't use ext3 for anything any more, ext4 and xfs have moved forward as being the primary default options now.
So to try and address your options one by one:
1) No, don't do this.
2) Perfectly acceptable, take care that data drives are mounted with recommended mount options (noatime etc)
3) Also perfectly acceptable, I see more people using XFS everywhere now, ext4 less so, but the deltas are relatively small, I'd go with whichever option you're more comfortable with as an organisation.
4) I wouldn't recommend that, if you're happy using XFS, use it everywhere, it just makes things easier but see point 2) about mount options for data drives
5) You can absolutely use LVM for your O/S partitions, just ideally don't use it for datanode and log directories.
Hope that helps!
Created 06-01-2016 01:23 PM
I have read about the care to be exercised while using ext4 (noatime etc) in several threads but is there some concise guide or doc. which can be used ?
Created 06-01-2016 12:10 PM
The HDP documentation around filesystem selection is out dated. ext4 and XFS are fine choices today.
You can use LVM for the OS filesystems. This provides a nice way to shuffle space around on your 2x 300GB OS drives as needed. XFS is perfectly fine here, so you can let RHEL use the default. However, note that XFS filesystems can not be shrunk, whereas with LVM + ext4, filesystems can be expanded and shrunk while online. This is a big gap for XFS.
For the datanode disks, do not use RAID or LVM. You want each individual disk mounted as a separate filesystem. You then provide HDFS with a comma separated list of mount points, and HDFS will handle spreading data and load across the disks. If you have 24 data disks per node, you should have 24 filesystems configured in HDFS. XFS is good choice here, since resizing is unlikely to come into play.
Also keep in mind that /var/log and /usr have specific needs. /var/log can grow to hundreds of GBs, so moving this logging to one of the data disks may be necessary. The HDP binaries are installed to /usr/hdp, and depending on which components you are installing, could use as much as 6GB per HDP release. Keep this in mind as sufficient space is needed here for upgrades.
Hope that helps.
Created on 06-01-2016 12:41 PM - edited 08-19-2019 04:05 AM
I suspected that the doc. for the file system is merely carried forward from the previous versions, I hope Hortonworks invests some resources in upgrading it 🙂
The LVM part I guess is clear - use it for OS partitions but NOT datanodes, am I right ?
Can you help me understand more about your inputs :
So what should I proceed with - ext4 everywhereORxfs everywhereORboth(xfs for datanodes etc. and ext4 for os partitions or vice versa)
What is the better idea, have large a large, dedicated disk(and add more if required and resize using LVM) for the OS partition so that log, binaries etc. have aplenty space or during the HDP installation itself OR redirect logs(YARN etc.) to some directories on the disks dedicated to the datanode. For example, this is how it is in the test cluster :
Created 06-01-2016 06:15 PM
You are correct, use LVM for OS disks, but not data disks.
In the end, the filesystem choice doesn't make a huge difference. ext4 everywhere would simply the overall design and allow for the ability to resize filesystems online in the future.
Allocating a larger amount of storage to the OS filesystems does simplify the install. Otherwise, during the Ambari install wizard, you need to go through each of the service's configurations and change "/var/log" to one of the data disk mount points (i.e. /opt/dev/sdb as an example above). If you allocated more storage to the OS (and subsequently made /usr say 30GB and /var/log 200GB), you would not have to change as much during the Ambari install. Either approach is viable, so I would suggest discussing with your OS admin team to see if they have a preference.
Also note that I'm referring to daemon logs (namenode, resource manager, etc) that end up in /var/log, versus application logs. The yarn settings you show above are for the yarn application logs and local scratch space. You want to follow that same pattern in production.