Created on 03-04-2016 10:22 AM - edited 08-19-2019 04:17 AM
we have 5 node cluster with following configurations for master and slaves.
HDPMaster 35 GB 500 GB HDPSlave1 15 GB 500 GB HDPSlave2 15 GB 500 GB HDPSlave3 15 GB 500 GB HDPSlave4 15 GB 500 GB HDPSlave5 15 GB 500 GB
But the cluster is not taking much space. I am aware of the fact that it will reserve some space for non-dfs use.But,it is taking uneven capacity for each slave node. Is there a way to reconfigure hdfs ?
PFA.
Even though all the nodes have same hard disk, only slave 4 is taking 431GB, remaining all nodes are utilizing very small space. Is there a way to resolve this ?
Created 03-05-2016 10:26 AM
I have never seen the same number for all the slave nodes because of the data distribution.
To overcome uneven block distribution scenario across the cluster, a utility program called balancer
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
Created 03-04-2016 10:28 AM
Is your replication factor set to 3? Are you using one reducer in your ingestion? You can use hdfs balancer to spread the data around your cluster https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Administrat...
Created 03-04-2016 10:31 AM
I think you're interpreting it wrong it's the opposite, only slave 4 is not taking up data, the other nodes are filled.
Created 03-04-2016 10:32 AM
yes, Replication factor is 3. But how spreading the data around the cluster help us in changing capacity of the nodes ?
Created 03-04-2016 10:36 AM
That's remaining capacity not total
Created 03-04-2016 10:39 AM
Yeah probably. I am new to this and I am not able to understand this whole configuration thing. Capacity is available space and non dfs is the space available for linux system use, if i am not wrong. so I still didn't understand the answer to my question. Why is the capacity(the available space ) is more for slave 4 alone when all the nodes including master have the same harddisk capacity.
Created 03-04-2016 10:56 AM
Go to the node and investigate the data dir directory you specified. Run hdfs fsck / command see if you have issue with hdfs, post screenshot of main page of Ambari with all widgets.
Created 03-04-2016 12:23 PM
Cluster is new. It hardly contain any data in it.
Created 03-04-2016 01:10 PM
OK you need to confirm which directories you specified for datanode in Ambari > hdfs > configs
Created 03-07-2016 05:55 AM
/opt/hadoop/hdfs/data,/tmp/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data