Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HDFS disks storage is not balanced

I have a 4 node cluster running(all of them are datanodes due to certain circumstances) after ingesting good amount of data from structured and unstructured data sources, I checked the disk usage of each host, and only 1 data node seems to have a balanced load on it's mounted disks.

 

Host 1

Filesystem             Size  Used Avail Use% Mounted on

/dev/sda3              130G   44G   87G  34% /

devtmpfs                95G     0   95G   0% /dev

tmpfs                   95G     0   95G   0% /dev/shm

tmpfs                   95G   67M   95G   1% /run

tmpfs                   95G     0   95G   0% /sys/fs/cgroup

/dev/sda1             1014M  173M  842M  18% /boot

/dev/mapper/rhel-home  2.0G   37M  2.0G   2% /home

tmpfs                   19G   12K   19G   1% /run/user/42

cm_processes            95G  182M   95G   1% /run/cloudera-scm-agent/process

/dev/sdb1              135G   65G   64G  51% /cmdisk/sdb

/dev/sdd1              135G   71G   58G  56% /cmdisk/sdd

/dev/sdc1              135G   69G   60G  54% /cmdisk/sdc

/dev/sde1              135G   70G   59G  55% /cmdisk/sde

/dev/sdf1              135G   72G   56G  57% /cmdisk/sdf

/dev/sdg1              135G   69G   60G  54% /cmdisk/sdg

/dev/sdh1              135G   73G   55G  57% /cmdisk/sdh

tmpfs                   19G     0   19G   0% /run/user/0

 

The other hosts seem to have more load on the 1st disk (sdb1) 

Spoiler
Host 2

/dev/sdb1              135G  107G   22G  84% /cmdisk/sdb

/dev/sdc1              135G   76G   52G  60% /cmdisk/sdc

/dev/sdd1              135G   75G   54G  59% /cmdisk/sdd

/dev/sde1              135G   79G   50G  62% /cmdisk/sde

/dev/sdf1              135G   75G   53G  59% /cmdisk/sdf

/dev/sdg1              135G   78G   51G  61% /cmdisk/sdg

/dev/sdh1              135G   77G   52G  60% /cmdisk/sdh

tmpfs                   19G     0   19G   0% /run/user/0

Host 3

/dev/sdb1              135G  118G   10G  93% /cmdisk/sdb

/dev/sdc1              135G   69G   59G  55% /cmdisk/sdc

/dev/sdd1              135G   69G   60G  54% /cmdisk/sdd

/dev/sde1              135G   71G   58G  56% /cmdisk/sde

/dev/sdf1              135G   71G   58G  56% /cmdisk/sdf

/dev/sdg1              135G   70G   58G  55% /cmdisk/sdg

/dev/sdh1              135G   68G   61G  53% /cmdisk/sdh

tmpfs                   19G     0   19G   0% /run/user/0

Host 4

/dev/sdb1              135G  118G   11G  93% /cmdisk/sdb

/dev/sdc1              135G   68G   61G  53% /cmdisk/sdc

/dev/sdd1              135G   69G   59G  55% /cmdisk/sdd

/dev/sde1              135G   69G   60G  54% /cmdisk/sde

/dev/sdf1              135G   71G   58G  55% /cmdisk/sdf

/dev/sdg1              135G   71G   58G  55% /cmdisk/sdg

/dev/sdh1              135G   70G   59G  55% /cmdisk/sdh

tmpfs                   19G     0   19G   0% /run/user/0

Is there a way to balance the disks used by HDFS? 

12 REPLIES 12

Cloudera Employee

Hi ,

 

        Please refer the following link :-

 

                https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/

 

        

        Let me know if it helped you or you need more information.

 

Regards,

Ganesh

Hi Ganesh,

 

Thanks for replying, I have a follow-up question.

 

Will running this command while the cluster is active (e.g. spark is running and some hive queries are being run) affect them in anyway? or will the plan run in the background so no processes will be affected?

 

Thanks,
Jan

Cloudera Employee

Hi Jan,

          It might impact queries / jobs if the query operates on the data being moved between the disks. You may see timeouts.

Regards,

Ganesh

Thanks Ganesh,

 

Will let you know if this will resolve my issue when I run it later when no one is using the cluster.

 

Regards,

Jan

Hi Ganesh,

 

I've ran the HDFS Diskbalancer and the results did not turn out as I expected. There is still a disparity between the disk in some of the data nodes. It did however lessen some of the load as compared to the previous check but I am not sure if this is the best that the diskbalancer could plan or the plan did not run properly.

 

After Diskbalancer

Spoiler
Host 2:

/dev/sdb1              135G   94G   35G  74% /cmdisk/sdb

/dev/sdc1              135G   65G   64G  51% /cmdisk/sdc

/dev/sdd1              135G   65G   64G  51% /cmdisk/sdd

/dev/sde1              135G   69G   60G  54% /cmdisk/sde

/dev/sdf1              135G   69G   59G  54% /cmdisk/sdf

/dev/sdg1              135G   67G   62G  52% /cmdisk/sdg

/dev/sdh1              135G   65G   64G  51% /cmdisk/sdh

Host 3:

/dev/sdb1              135G  111G   17G  87% /cmdisk/sdb

/dev/sdc1              135G   59G   70G  46% /cmdisk/sdc

/dev/sdd1              135G   58G   71G  45% /cmdisk/sdd

/dev/sde1              135G   59G   69G  47% /cmdisk/sde

/dev/sdf1              135G   59G   70G  46% /cmdisk/sdf

/dev/sdg1              135G   65G   64G  51% /cmdisk/sdg

/dev/sdh1              135G   59G   70G  46% /cmdisk/sdh


Host 4:

/dev/sdb1              135G  110G   18G  86% /cmdisk/sdb

/dev/sdc1              135G   58G   71G  45% /cmdisk/sdc

/dev/sdd1              135G   59G   70G  46% /cmdisk/sdd

/dev/sde1              135G   64G   64G  50% /cmdisk/sde

/dev/sdf1              135G   60G   69G  47% /cmdisk/sdf

/dev/sdg1              135G   59G   70G  46% /cmdisk/sdg

/dev/sdh1              135G   59G   70G  46% /cmdisk/sdh

Thanks again for helping, it is much appreciated.

 

Regards,

Jan

 

Cloudera Employee

Hi Jan,

           I assumed the disk /cmdisk/sdb is one of the DataNode directories. As pointed out by @npandey , it may be journal node edits directory or NameNode data directory.  Could you share the DataNode Data Directory (dfs.data.dir, dfs.datanode.data.dir) list of your cluster? 

 

You can follow the below path to get the information:-

            HDFS -> Configuration  -> DataNode Data Directory.        

 

Regards,

Ganesh

Hi Ganesh,

 

Here's a screen cap of the configuration.

Data node directory.png

 

Thanks,

Jan

Cloudera Employee

@TheBroMeister  I think you should also check the contents inside "/cmdisk/sdb" on all the problematic host by running du -sh* and check who is utilizing what. We have seen cases where disks were filled due to non-DFS data which could be yarn local dir or yarn log dir or anything else. 

I would also recommend to run "hdfs dfsadmin -report" and check the DFS Used% and non-DFS Used% on all the datanodes.
If DFS Used% is same that means hdfs data is already balanced and we need to check above mentioned point. Thank you! 

Hi @npandey 

 

I've checked the "hdfs dfsadmin -report" and all the Non-DFS storage used in all of my data nodes are 0. As for the DFS Used% they are not all the same. Host 1 has 47%, Host 2 has 56%, Host 3 has 54%, and Host 4 has 53%.

 

After running the Disk balancer, i checked the disk usage on the mount points of all my hosts and they still seem to not be as balanced as I would hope.

 

$ du -h results:

Spoiler
Host 1

55G ./sdb

60G ./sdc

61G ./sdd

60G ./sde

63G ./sdf

59G ./sdg

64G ./sdh

418G .

Host 2

94G ./sdb

65G ./sdc

65G ./sdd

68G ./sde

69G ./sdf

67G ./sdg

65G ./sdh

489G .


Host 3

111G ./sdb

59G ./sdc

58G ./sdd

59G ./sde

59G ./sdf

65G ./sdg

59G ./sdh

467G .

Host 4

110G ./sdb

58G ./sdc

59G ./sdd

64G ./sde

60G ./sdf

59G ./sdg

59G ./sdh

465G .

I am unsure if this is an acceptable level for the balance and how else to proceed.

 

Regards,

Jan 

Cloudera Employee

Hi @TheBroMeister  Do you have Namenode or Journal node or any other components on these datanode hosts?
Can you please provide the output of below-

ls -lrt /cmdisk/sdb
du -sh /cmdisk/sdb/*

 

Hi @npandey I currently have no access to the cluster as of the moment, but I will get back to you once I do.

Expert Contributor

Hi @TheBroMeister 

 

What command you tried to balance hdfs previously ?

Can you try running hdfs balancer as below -

 

$ sudo –u hdfs hdfs balancer  -threshold 1

While running you can check hdfs logs which can show you the details of moving data percent.

Do revert if it wont work.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.