Support Questions

IgorYakushin · ‎01-24-2017

Hi All,

I am building a Hadoop cluster on a new hardware. Each data node has 14 4T disks.

Do I understand correctly that one should build a file system (ext3, ext4 or xfs) on each disk separately and mount each disk before configuring HDFS?

Cloudera documentation seems to suggest that ext3 is the most tested file system for HDFS. Am I really better off with ext3 or should I use ext4 or xfs? Reliability is more important to me than performance.

The recommended options for mounting in /etc/fstab:

/dev/sdb1 /data1 ext4 defaults,noatime 0

?

Thank you,

Igor

Laith · ‎01-25-2017

Hello Igor!

You can start building your cluster using any of Cloudera CDH supported file systems; ext3, ext4 and XFS. Avoid using LVM partitioning method (which is the default partitioning method in CentOS6 and 7, but use the manual disk partitioning instead).

And yes, the recommended option for mounting in /etc/fstab is just like you stated.

/dev/sdb1 /data1 ext4 defaults,noatime 0

For more information please take a look into this article.

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/install_cdh_file_system.html

Thanks

Laith

IgorYakushin · ‎01-30-2017

Hi Laith,

If I am building ext4, should I just do

|mkfs.ext4 /dev/sdb or should I use some extra options? For example,
considering that by default HDFS uses block size 128M, might it make
sense to have bigger block size for the underlying ext4? Thank you, Igor |

Laith · ‎02-01-2017

Hello Igor,

To create a partition in Linux, you’d need to ‘fdisk’ it first. In your example, (sdb) is the disk, so you’d need to to create the partition (sdb1):

fdisk /dev/sdb

After that, you’d need to format the new partition into an ext4:

mkfs.ext4 /dev/sdb1

Make sure you are mount it correctly in /etc/fstab, just like I stated in my first response, ‘mount -a’ command is a good way to examine your fstab entries.

In regards to the HDFS block size, the block division in HFDS is just logically built over the physical blocks of the ext4 filesystem; HDFS blocks are large compared to disk blocks, and the reason for this is to minimize the cost of seeks. If the block is large enough, the time it takes to transfer the data from the disk can be significantly longer than the time to seek to the start of the block.

If there are any additional questions, please let me know.

Thanks,

Laith

Cloudera Community

Support Questions

ext3 or ext4 or xfs?