Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Configure Storage capacity of Hadoop cluster

avatar
Rising Star

we have 5 node cluster with following configurations for master and slaves.

HDPMaster   35 GB   500 GB
HDPSlave1   15 GB   500 GB
HDPSlave2   15 GB   500 GB
HDPSlave3   15 GB   500 GB
HDPSlave4   15 GB   500 GB
HDPSlave5   15 GB   500 GB

But the cluster is not taking much space. I am aware of the fact that it will reserve some space for non-dfs use.But,it is taking uneven capacity for each slave node. Is there a way to reconfigure hdfs ?

PFA.

2586-namenode.png

Even though all the nodes have same hard disk, only slave 4 is taking 431GB, remaining all nodes are utilizing very small space. Is there a way to resolve this ?

1 ACCEPTED SOLUTION

avatar
Master Mentor
@vinay kumar

I have never seen the same number for all the slave nodes because of the data distribution.

Link

To overcome uneven block distribution scenario across the cluster, a utility program called balancer

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

View solution in original post

28 REPLIES 28

avatar
Master Mentor

@vinay kumar Whats the output of df -h in slave 4?

You can add /hadoop and restart HDFS and then you can remove other mounts from the settings.

avatar
Rising Star

@Neeraj Sabharwal

I have attached an image in my previous comment for slave-4 df -h command. So if i remove the other directories...wouldn't it effect the existing cluster in any way ?

I am getting this error after removing other directories and replacing them with /hadoop . And yes,the cluster size has been increased.

2655-errorhdfs.png

avatar
Master Mentor

@vinay kumar I was going to add /hadoop and then remove other directories after sometime.

avatar
Rising Star
@Neeraj Sabharwal

I think now i got the clear picture of it. Since the mount / is partitioned with 400 GB we should use it alone to make use of that memory. But then that configuration is default configuration given by amabari. Wouldn't it affect the cluster in any way ? should i take care of anything ?

avatar
Master Mentor

@vinay kumar We allocate dedicated disks for HDFS data. We have to modify the datanode dir setting during the install.

avatar
Rising Star
@Neeraj Sabharwal

adding /hadoop and deleting other directories after some time is resulting in missing blocks. Is there any way to overcome this? When i run hdfs fsck command.Its showing that all block are missing. The reason for being could be the removal of directories. do we need to copy the data from directories into new directory(/hadoop) will that help ??

avatar
@vinay kumar

Maybe you have problem in disk partitioning. Can you check how much space you have allocated for partitions used by HDP?

Here's a link for partitioning recommendations http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_cluster-planning-guide/content/ch_partiti...

avatar
Rising Star

I have allocated around 400 GB for / partition. PFA.

2650-df.png

avatar
Rising Star

Hi @vinay kumar, I think your partitioning is wrong you are not using "/" for hdfs directory. If you want use full disk capacity, you can create any folder name under "/" example /data/1 on every data node using command "#mkdir -p /data/1" and add to it dfs.datanode.data.dir. restart the hdfs service.

You should get the desired output.