Created 06-15-2016 05:45 PM
I see that by default after the automated install using Ambari, there are bunch of directories under name node and data node settings in the configuration. Can you tell me if that is the best practice or if we should remove some directories in that and keep only one in each. Also please let me know if removing those do not affect the hdfs services.
Thanks,
Chandra
Created 06-15-2016 06:01 PM
@chandramouli muthukumaran if you intend to use the /opt/Symantec/hadoop/hdfs/namenode for namenode and /opt/Symantec/hadoop/hdfs/data only then you can remove the other entries. Save and then you might need to restart the HDFS and corresponding services as indicated by ambari.
Incase if you have multiple directories for Data then add those entries like this.
/opt/Symantec/hadoop/hdfs/data1,/opt/Symantec/hadoop/hdfs/data2,/opt/Symantec/hadoop/hdfs/data3
Hope this helps.
Created 06-15-2016 06:01 PM
@chandramouli muthukumaran if you intend to use the /opt/Symantec/hadoop/hdfs/namenode for namenode and /opt/Symantec/hadoop/hdfs/data only then you can remove the other entries. Save and then you might need to restart the HDFS and corresponding services as indicated by ambari.
Incase if you have multiple directories for Data then add those entries like this.
/opt/Symantec/hadoop/hdfs/data1,/opt/Symantec/hadoop/hdfs/data2,/opt/Symantec/hadoop/hdfs/data3
Hope this helps.
Created 06-15-2016 06:01 PM
Thanks for your answer. If we have multiple directories, will the hdfs files be stored multiple times in those directories? sorry I am a newbie hence need to get this clarified.
Created 06-15-2016 06:07 PM
@chandramouli muthukumaran No, as for HDFS files, their storage will depend only on replication factor. Think about it this way. You start with a fresh linux install. You have different mount points in your system with different capacities. Which mount points would you like to use to store your HDFS data (datanode) as well as your metadata (namenode).
Created 06-15-2016 06:11 PM
@chandramouli muthukumaran No it does not store multiple times.
Good Luck with your hadooping.
Created 06-15-2016 06:08 PM
1. Namenode : 2 directories are enough for backing up namenode metadata in case of any crash of namenode. Usually 1st disk should be local disk and prefer 2nd disk as network storage[san/nas](Just incase local machine goes down you can have backup of namenode metadata on network storage). If you do not have network storage then 2 local disk are fine.
Disadvantage of multiple disk for storage - I/O performance will hamper as namenode will copy metadata to all the disk.
2. Datanodes : If you have multiple HDD attached to the machine then we can usually use them for HDFS data storage.
Multiple disk on datanode is not a problem. For datanode it will not store multiple copies of same data on all disk as of Namenode.
Please check link below with basic concept explained -http://hortonworks.com/blog/hdfs-metadata-directories-explained/
Created 06-15-2016 06:25 PM
thanks much