Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Name Node and Data Node Directories

avatar
Expert Contributor

directories.pngHi,

I see that by default after the automated install using Ambari, there are bunch of directories under name node and data node settings in the configuration. Can you tell me if that is the best practice or if we should remove some directories in that and keep only one in each. Also please let me know if removing those do not affect the hdfs services.

Thanks,

Chandra

1 ACCEPTED SOLUTION

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
6 REPLIES 6

avatar
Expert Contributor
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor

Thanks for your answer. If we have multiple directories, will the hdfs files be stored multiple times in those directories? sorry I am a newbie hence need to get this clarified.

avatar
Super Guru

@chandramouli muthukumaran No, as for HDFS files, their storage will depend only on replication factor. Think about it this way. You start with a fresh linux install. You have different mount points in your system with different capacities. Which mount points would you like to use to store your HDFS data (datanode) as well as your metadata (namenode).

avatar
Expert Contributor

@chandramouli muthukumaran No it does not store multiple times.

Good Luck with your hadooping.

avatar
Super Guru

@chandramouli muthukumaran

1. Namenode : 2 directories are enough for backing up namenode metadata in case of any crash of namenode. Usually 1st disk should be local disk and prefer 2nd disk as network storage[san/nas](Just incase local machine goes down you can have backup of namenode metadata on network storage). If you do not have network storage then 2 local disk are fine.

Disadvantage of multiple disk for storage - I/O performance will hamper as namenode will copy metadata to all the disk.

2. Datanodes : If you have multiple HDD attached to the machine then we can usually use them for HDFS data storage.

Multiple disk on datanode is not a problem. For datanode it will not store multiple copies of same data on all disk as of Namenode.

Please check link below with basic concept explained -http://hortonworks.com/blog/hdfs-metadata-directories-explained/

avatar
Expert Contributor

thanks much