we have 5 node cluster with following configurations for master and slaves.
HDPMaster 35 GB 500 GB HDPSlave1 15 GB 500 GB HDPSlave2 15 GB 500 GB HDPSlave3 15 GB 500 GB HDPSlave4 15 GB 500 GB HDPSlave5 15 GB 500 GB
But the cluster is not taking much space. I am aware of the fact that it will reserve some space for non-dfs use.But,it is taking uneven capacity for each slave node. Is there a way to reconfigure hdfs ?
Even though all the nodes have same hard disk, only slave 4 is taking 431GB, remaining all nodes are utilizing very small space. Is there a way to resolve this ?
I have never seen the same number for all the slave nodes because of the data distribution.
To overcome uneven block distribution scenario across the cluster, a utility program called balancer
Is your replication factor set to 3? Are you using one reducer in your ingestion? You can use hdfs balancer to spread the data around your cluster https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#Administrat...
Yeah probably. I am new to this and I am not able to understand this whole configuration thing. Capacity is available space and non dfs is the space available for linux system use, if i am not wrong. so I still didn't understand the answer to my question. Why is the capacity(the available space ) is more for slave 4 alone when all the nodes including master have the same harddisk capacity.
Go to the node and investigate the data dir directory you specified. Run hdfs fsck / command see if you have issue with hdfs, post screenshot of main page of Ambari with all widgets.