Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hadoop initial disk formatting

avatar
Expert Contributor

Hi,

I am going to setup a Hadoop cluster on brand new physical servers and this will be my first time.

There are servers those planned to be masters and slaves.

Regarding to this page:

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/ch_partitioning_...

I guess I need to format disks on slaves as ext3, then mount them then define them in hfs-site.xml files. Is that correct?

I also read about command

hadoop namenode -format

Am I also supposed to format RAID partitions on masters before this command? Or does this command also do what is needed on OS level? Maybe it is not possible to mount a drive before formatting it on Linux and maybe this command is not about file system formatting but I couldn't make sure since I am also not familiar with linux disk mounting.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Sedat Kestepe

Are you performing a manual installation or are you using Ambari? I highly recommend you use Ambari if you can, as it will take care of things such as formatting HDFS.

For the HDFS slave nodes, you should format the data drives individually at the OS level. You should then mount those drives individually into their own mount path, such as /grid/disk01, /grid/disk02, etc. You should not use RAID for your data drives.

For the master servers, if you want to use RAID 1 to create mirrors for the namenode directories, Zookeeper directories, etc, then you should also do that at the OS level before installing HDP. Once you have created the RAID configuration for the drives, then mount them at the OS level. During the installation process with Ambari you can then specific that you want to use those OS mounted locations for the directories.

To use Ambari, follow these instructions: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Getting_Re...

You may find this HCC article helpful: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto....

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Sedat Kestepe

Are you performing a manual installation or are you using Ambari? I highly recommend you use Ambari if you can, as it will take care of things such as formatting HDFS.

For the HDFS slave nodes, you should format the data drives individually at the OS level. You should then mount those drives individually into their own mount path, such as /grid/disk01, /grid/disk02, etc. You should not use RAID for your data drives.

For the master servers, if you want to use RAID 1 to create mirrors for the namenode directories, Zookeeper directories, etc, then you should also do that at the OS level before installing HDP. Once you have created the RAID configuration for the drives, then mount them at the OS level. During the installation process with Ambari you can then specific that you want to use those OS mounted locations for the directories.

To use Ambari, follow these instructions: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Getting_Re...

You may find this HCC article helpful: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto....

avatar
Expert Contributor

Thank you for the answer, @Michael Young... and for the article link. It looks great.

Yes, I will use Ambari (2.4.2. And HDP 2.5.3).

I think now I can ask it more clearly: Are -formatting a hard disk and mounting it- and -formatting HDFS- somehow related? Or different steps/operations?

Does the command "hadoop name node -format" command include disk formatting, mounting? I guess not, does it?

avatar
Super Guru

@Sedat Kestepe

No, formating and mounting the hard drives is not directly related to formatting HDFS. Conceptually the idea of "formating" is the same. But the two tasks are completely separate with no direct relationship.

The hadoop format command does not format or mount the hard drives. The hard drives should already be formatted and mounted. When you run the format command for HDFS, it is preparing NameNode fsimage file so that it knows where all of the storage blocks are across the data disks.

If you feel my answer has been helpful, please accept it to help others.