Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Configure Storage capacity of Hadoop cluster

avatar
Rising Star

we have 5 node cluster with following configurations for master and slaves.

HDPMaster   35 GB   500 GB
HDPSlave1   15 GB   500 GB
HDPSlave2   15 GB   500 GB
HDPSlave3   15 GB   500 GB
HDPSlave4   15 GB   500 GB
HDPSlave5   15 GB   500 GB

But the cluster is not taking much space. I am aware of the fact that it will reserve some space for non-dfs use.But,it is taking uneven capacity for each slave node. Is there a way to reconfigure hdfs ?

PFA.

2586-namenode.png

Even though all the nodes have same hard disk, only slave 4 is taking 431GB, remaining all nodes are utilizing very small space. Is there a way to resolve this ?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@vinay kumar

I have never seen the same number for all the slave nodes because of the data distribution.

Link

To overcome uneven block distribution scenario across the cluster, a utility program called balancer

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

View solution in original post

28 REPLIES 28

avatar
Master Mentor

Is your replication factor set to 3? Are you using one reducer in your ingestion? You can use hdfs balancer to spread the data around your cluster https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Administrat...

avatar
Master Mentor

I think you're interpreting it wrong it's the opposite, only slave 4 is not taking up data, the other nodes are filled.

avatar
Rising Star

yes, Replication factor is 3. But how spreading the data around the cluster help us in changing capacity of the nodes ?

avatar
Master Mentor

That's remaining capacity not total

avatar
Rising Star

Yeah probably. I am new to this and I am not able to understand this whole configuration thing. Capacity is available space and non dfs is the space available for linux system use, if i am not wrong. so I still didn't understand the answer to my question. Why is the capacity(the available space ) is more for slave 4 alone when all the nodes including master have the same harddisk capacity.

avatar
Master Mentor

Go to the node and investigate the data dir directory you specified. Run hdfs fsck / command see if you have issue with hdfs, post screenshot of main page of Ambari with all widgets.

avatar
Rising Star

Cluster is new. It hardly contain any data in it.

avatar
Master Mentor

OK you need to confirm which directories you specified for datanode in Ambari > hdfs > configs

avatar
Rising Star

/opt/hadoop/hdfs/data,/tmp/hadoop/hdfs/data,/usr/hadoop/hdfs/data,/usr/local/hadoop/hdfs/data