Support Questions

vinaykumarpotnu · ‎03-04-2016

we have 5 node cluster with following configurations for master and slaves.

HDPMaster   35 GB   500 GB
HDPSlave1   15 GB   500 GB
HDPSlave2   15 GB   500 GB
HDPSlave3   15 GB   500 GB
HDPSlave4   15 GB   500 GB
HDPSlave5   15 GB   500 GB

But the cluster is not taking much space. I am aware of the fact that it will reserve some space for non-dfs use.But,it is taking uneven capacity for each slave node. Is there a way to reconfigure hdfs ?

PFA.

Even though all the nodes have same hard disk, only slave 4 is taking 431GB, remaining all nodes are utilizing very small space. Is there a way to resolve this ?

nsabharwal · ‎03-05-2016

@vinay kumar

I have never seen the same number for all the slave nodes because of the data distribution.

Link

To overcome uneven block distribution scenario across the cluster, a utility program called balancer

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

View solution in original post

aervits · ‎03-04-2016

Is your replication factor set to 3? Are you using one reducer in your ingestion? You can use hdfs balancer to spread the data around your cluster https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Administrat...

aervits · ‎03-04-2016

I think you're interpreting it wrong it's the opposite, only slave 4 is not taking up data, the other nodes are filled.

vinaykumarpotnu · ‎03-04-2016

yes, Replication factor is 3. But how spreading the data around the cluster help us in changing capacity of the nodes ?

aervits · ‎03-04-2016

That's remaining capacity not total

vinaykumarpotnu · ‎03-04-2016

Yeah probably. I am new to this and I am not able to understand this whole configuration thing. Capacity is available space and non dfs is the space available for linux system use, if i am not wrong. so I still didn't understand the answer to my question. Why is the capacity(the available space ) is more for slave 4 alone when all the nodes including master have the same harddisk capacity.

aervits · ‎03-04-2016

Go to the node and investigate the data dir directory you specified. Run hdfs fsck / command see if you have issue with hdfs, post screenshot of main page of Ambari with all widgets.

vinaykumarpotnu · ‎03-04-2016

Cluster is new. It hardly contain any data in it.

aervits · ‎03-04-2016

OK you need to confirm which directories you specified for datanode in Ambari > hdfs > configs

vinaykumarpotnu · ‎03-07-2016

/opt/hadoop/hdfs/data,/tmp/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data

Cloudera Community

Support Questions

Configure Storage capacity of Hadoop cluster