Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Name Node and Data Node Directories

Contributor

directories.pngHi,

I see that by default after the automated install using Ambari, there are bunch of directories under name node and data node settings in the configuration. Can you tell me if that is the best practice or if we should remove some directories in that and keep only one in each. Also please let me know if removing those do not affect the hdfs services.

Thanks,

Chandra

1 ACCEPTED SOLUTION

Rising Star

@chandramouli muthukumaran if you intend to use the /opt/Symantec/hadoop/hdfs/namenode for namenode and /opt/Symantec/hadoop/hdfs/data only then you can remove the other entries. Save and then you might need to restart the HDFS and corresponding services as indicated by ambari.

Incase if you have multiple directories for Data then add those entries like this.

/opt/Symantec/hadoop/hdfs/data1,/opt/Symantec/hadoop/hdfs/data2,/opt/Symantec/hadoop/hdfs/data3

Hope this helps.

View solution in original post

6 REPLIES 6

Rising Star

@chandramouli muthukumaran if you intend to use the /opt/Symantec/hadoop/hdfs/namenode for namenode and /opt/Symantec/hadoop/hdfs/data only then you can remove the other entries. Save and then you might need to restart the HDFS and corresponding services as indicated by ambari.

Incase if you have multiple directories for Data then add those entries like this.

/opt/Symantec/hadoop/hdfs/data1,/opt/Symantec/hadoop/hdfs/data2,/opt/Symantec/hadoop/hdfs/data3

Hope this helps.

Contributor

Thanks for your answer. If we have multiple directories, will the hdfs files be stored multiple times in those directories? sorry I am a newbie hence need to get this clarified.

Super Guru

@chandramouli muthukumaran No, as for HDFS files, their storage will depend only on replication factor. Think about it this way. You start with a fresh linux install. You have different mount points in your system with different capacities. Which mount points would you like to use to store your HDFS data (datanode) as well as your metadata (namenode).

Rising Star

@chandramouli muthukumaran No it does not store multiple times.

Good Luck with your hadooping.

@chandramouli muthukumaran

1. Namenode : 2 directories are enough for backing up namenode metadata in case of any crash of namenode. Usually 1st disk should be local disk and prefer 2nd disk as network storage[san/nas](Just incase local machine goes down you can have backup of namenode metadata on network storage). If you do not have network storage then 2 local disk are fine.

Disadvantage of multiple disk for storage - I/O performance will hamper as namenode will copy metadata to all the disk.

2. Datanodes : If you have multiple HDD attached to the machine then we can usually use them for HDFS data storage.

Multiple disk on datanode is not a problem. For datanode it will not store multiple copies of same data on all disk as of Namenode.

Please check link below with basic concept explained -http://hortonworks.com/blog/hdfs-metadata-directories-explained/

Contributor

thanks much

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.