Created 10-23-2019 07:51 PM
I have a 4 node cluster running(all of them are datanodes due to certain circumstances) after ingesting good amount of data from structured and unstructured data sources, I checked the disk usage of each host, and only 1 data node seems to have a balanced load on it's mounted disks.
Host 1
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 130G 44G 87G 34% /
devtmpfs 95G 0 95G 0% /dev
tmpfs 95G 0 95G 0% /dev/shm
tmpfs 95G 67M 95G 1% /run
tmpfs 95G 0 95G 0% /sys/fs/cgroup
/dev/sda1 1014M 173M 842M 18% /boot
/dev/mapper/rhel-home 2.0G 37M 2.0G 2% /home
tmpfs 19G 12K 19G 1% /run/user/42
cm_processes 95G 182M 95G 1% /run/cloudera-scm-agent/process
/dev/sdb1 135G 65G 64G 51% /cmdisk/sdb
/dev/sdd1 135G 71G 58G 56% /cmdisk/sdd
/dev/sdc1 135G 69G 60G 54% /cmdisk/sdc
/dev/sde1 135G 70G 59G 55% /cmdisk/sde
/dev/sdf1 135G 72G 56G 57% /cmdisk/sdf
/dev/sdg1 135G 69G 60G 54% /cmdisk/sdg
/dev/sdh1 135G 73G 55G 57% /cmdisk/sdh
tmpfs 19G 0 19G 0% /run/user/0
The other hosts seem to have more load on the 1st disk (sdb1)
/dev/sdb1 135G 107G 22G 84% /cmdisk/sdb
/dev/sdc1 135G 76G 52G 60% /cmdisk/sdc
/dev/sdd1 135G 75G 54G 59% /cmdisk/sdd
/dev/sde1 135G 79G 50G 62% /cmdisk/sde
/dev/sdf1 135G 75G 53G 59% /cmdisk/sdf
/dev/sdg1 135G 78G 51G 61% /cmdisk/sdg
/dev/sdh1 135G 77G 52G 60% /cmdisk/sdh
tmpfs 19G 0 19G 0% /run/user/0
Host 3
/dev/sdb1 135G 118G 10G 93% /cmdisk/sdb
/dev/sdc1 135G 69G 59G 55% /cmdisk/sdc
/dev/sdd1 135G 69G 60G 54% /cmdisk/sdd
/dev/sde1 135G 71G 58G 56% /cmdisk/sde
/dev/sdf1 135G 71G 58G 56% /cmdisk/sdf
/dev/sdg1 135G 70G 58G 55% /cmdisk/sdg
/dev/sdh1 135G 68G 61G 53% /cmdisk/sdh
tmpfs 19G 0 19G 0% /run/user/0
Host 4
/dev/sdb1 135G 118G 11G 93% /cmdisk/sdb
/dev/sdc1 135G 68G 61G 53% /cmdisk/sdc
/dev/sdd1 135G 69G 59G 55% /cmdisk/sdd
/dev/sde1 135G 69G 60G 54% /cmdisk/sde
/dev/sdf1 135G 71G 58G 55% /cmdisk/sdf
/dev/sdg1 135G 71G 58G 55% /cmdisk/sdg
/dev/sdh1 135G 70G 59G 55% /cmdisk/sdh
tmpfs 19G 0 19G 0% /run/user/0
Is there a way to balance the disks used by HDFS?
Created 10-23-2019 09:26 PM
Hi ,
Please refer the following link :-
https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/
Let me know if it helped you or you need more information.
Regards,
Ganesh
Created 10-23-2019 10:29 PM
Hi Ganesh,
Thanks for replying, I have a follow-up question.
Will running this command while the cluster is active (e.g. spark is running and some hive queries are being run) affect them in anyway? or will the plan run in the background so no processes will be affected?
Thanks,
Jan
Created 10-23-2019 11:20 PM
Hi Jan,
It might impact queries / jobs if the query operates on the data being moved between the disks. You may see timeouts.
Regards,
Ganesh
Created 10-24-2019 12:09 AM
Thanks Ganesh,
Will let you know if this will resolve my issue when I run it later when no one is using the cluster.
Regards,
Jan
Created 10-24-2019 06:45 PM
Hi Ganesh,
I've ran the HDFS Diskbalancer and the results did not turn out as I expected. There is still a disparity between the disk in some of the data nodes. It did however lessen some of the load as compared to the previous check but I am not sure if this is the best that the diskbalancer could plan or the plan did not run properly.
After Diskbalancer
/dev/sdb1 135G 94G 35G 74% /cmdisk/sdb
/dev/sdc1 135G 65G 64G 51% /cmdisk/sdc
/dev/sdd1 135G 65G 64G 51% /cmdisk/sdd
/dev/sde1 135G 69G 60G 54% /cmdisk/sde
/dev/sdf1 135G 69G 59G 54% /cmdisk/sdf
/dev/sdg1 135G 67G 62G 52% /cmdisk/sdg
/dev/sdh1 135G 65G 64G 51% /cmdisk/sdh
Host 3:
/dev/sdb1 135G 111G 17G 87% /cmdisk/sdb
/dev/sdc1 135G 59G 70G 46% /cmdisk/sdc
/dev/sdd1 135G 58G 71G 45% /cmdisk/sdd
/dev/sde1 135G 59G 69G 47% /cmdisk/sde
/dev/sdf1 135G 59G 70G 46% /cmdisk/sdf
/dev/sdg1 135G 65G 64G 51% /cmdisk/sdg
/dev/sdh1 135G 59G 70G 46% /cmdisk/sdh
/dev/sdb1 135G 110G 18G 86% /cmdisk/sdb
/dev/sdc1 135G 58G 71G 45% /cmdisk/sdc
/dev/sdd1 135G 59G 70G 46% /cmdisk/sdd
/dev/sde1 135G 64G 64G 50% /cmdisk/sde
/dev/sdf1 135G 60G 69G 47% /cmdisk/sdf
/dev/sdg1 135G 59G 70G 46% /cmdisk/sdg
/dev/sdh1 135G 59G 70G 46% /cmdisk/sdh
Thanks again for helping, it is much appreciated.
Regards,
Jan
Created 10-24-2019 07:10 PM
Hi Jan,
I assumed the disk /cmdisk/sdb is one of the DataNode directories. As pointed out by @npandey , it may be journal node edits directory or NameNode data directory. Could you share the DataNode Data Directory (dfs.data.dir, dfs.datanode.data.dir) list of your cluster?
You can follow the below path to get the information:-
HDFS -> Configuration -> DataNode Data Directory.
Regards,
Ganesh
Created 10-24-2019 07:41 PM
Hi Ganesh,
Here's a screen cap of the configuration.
Thanks,
Jan
Created 10-24-2019 04:36 AM
@TheBroMeister I think you should also check the contents inside "/cmdisk/sdb" on all the problematic host by running du -sh* and check who is utilizing what. We have seen cases where disks were filled due to non-DFS data which could be yarn local dir or yarn log dir or anything else.
I would also recommend to run "hdfs dfsadmin -report" and check the DFS Used% and non-DFS Used% on all the datanodes.
If DFS Used% is same that means hdfs data is already balanced and we need to check above mentioned point. Thank you!
Created 10-24-2019 06:40 PM
Hi @npandey
I've checked the "hdfs dfsadmin -report" and all the Non-DFS storage used in all of my data nodes are 0. As for the DFS Used% they are not all the same. Host 1 has 47%, Host 2 has 56%, Host 3 has 54%, and Host 4 has 53%.
After running the Disk balancer, i checked the disk usage on the mount points of all my hosts and they still seem to not be as balanced as I would hope.
$ du -h results:
55G ./sdb
60G ./sdc
61G ./sdd
60G ./sde
63G ./sdf
59G ./sdg
64G ./sdh
418G .
Host 2
94G ./sdb
65G ./sdc
65G ./sdd
68G ./sde
69G ./sdf
67G ./sdg
65G ./sdh
489G .
111G ./sdb
59G ./sdc
58G ./sdd
59G ./sde
59G ./sdf
65G ./sdg
59G ./sdh
467G .
Host 4
110G ./sdb
58G ./sdc
59G ./sdd
64G ./sde
60G ./sdf
59G ./sdg
59G ./sdh
465G .
I am unsure if this is an acceptable level for the balance and how else to proceed.
Regards,
Jan
Created 11-03-2019 11:45 PM
Hi @TheBroMeister Do you have Namenode or Journal node or any other components on these datanode hosts?
Can you please provide the output of below-
ls -lrt /cmdisk/sdb
du -sh /cmdisk/sdb/*
Created 11-13-2019 10:19 PM
Hi @npandey I currently have no access to the cluster as of the moment, but I will get back to you once I do.
Created 11-13-2019 11:02 PM
What command you tried to balance hdfs previously ?
Can you try running hdfs balancer as below -
$ sudo –u hdfs hdfs balancer -threshold 1
While running you can check hdfs logs which can show you the details of moving data percent.
Do revert if it wont work.