Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hdfs data disk size is exceeding 90% threshold while rest of the disks(in the same server) are about 55%

avatar
Expert Contributor

Hi all,

Can you please guide me how to troubleshoot why one of the disks in one data node is exceeding its size than the others.

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/VolGroup00-LogVol00 1.8T 3.1G 1.7T 1% / tmpfs 127G 264K 127G 1% /dev/shm /dev/sda1 485M 38M 422M 9% /boot /dev/sdb1 1.8T 956G 785G 55% /data01 /dev/sdc1 1.8T 964G 777G 56% /data02 /dev/sdd1 1.8T 960G 781G 56% /data03 /dev/sde1 1.8T 931G 810G 54% /data04 /dev/sdf1 1.8T 962G 779G 56% /data05 /dev/sdg1 1.8T 944G 796G 55% /data06 /dev/sdh1 1.8T 945G 796G 55% /data07 /dev/sdi1 1.8T 1.6T 192G 90% /data08 /dev/sdj1 1.8T 934G 806G 54% /data09 /dev/sdk1 1.8T 930G 811G 54% /data10 /dev/sdl1 1.8T 940G 800G 55% /data11 /dev/mapper/VolGroup00-LogVol04 5.0G 696M 4.0G 15% /home /dev/mapper/VolGroup00-LogVol03 9.9G 470M 8.9G 5% /opt /dev/mapper/VolGroup00-LogVol05 5.0G 150M 4.6G 4% /tmp /dev/mapper/VolGroup00-LogVol02 20G 1.9G 17G 11% /var

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@PJ Depending on your situation there are many solutions.

This was a fundamental issue in HDFS for a long time. Very recently we have fixed this issue, there is a new tool called DiskBalancer -- which ships with trunk. If you want to see the details of the design and fix -- please look at https://issues.apache.org/jira/browse/HDFS-1312 .

It essentially allows you to create a plan file -- That describes how data will be moved from disk to disk and then you can ask a datanode to execute it.

Unfortunately, this tool is yet to be shipped as part of HDP. Soon we will be shipping it.

Presuming you are running an older version of HDFS, and you have many datanodes in the cluster, you can decommission this full node and re-add them. However the speed of HDFS replication is throttled, so if you want this to happen fast, you might have to set these parameters in your cluster.

  1. dfs.namenode.replication.work.multiplier.per.iteration = 10
  2. dfs.namenode.replication.max-streams = 50
  3. dfs.namenode.replication.max-streams-hard-limit = 100

Last, the one that I would least advise you to do, I am writing this down for the sake of completeness, is to follow what the apache documentation suggests.

https://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_th...

Please note, this is a dangerous action and unless you really know what you are doing this can lead to data loss. So please, please make sure that you can restore the machine to earlier state if you really decide to go this route.

View solution in original post

12 REPLIES 12

avatar
Expert Contributor

Hi aengineer!

Thanks so much for the information. Is the fix good for any HDP release. I want to use it on HDP 2.1 & 2.4.2

Also, if i need to rebalance the disks, do i really need to decommission it or just stop the Datanode and start after a while...??

Thanks in advance..

avatar
Expert Contributor

As far as I know, we have not backported this change to HDP 2.1 or 2.4.2. There is nothing technically preventing us from doing so; Disk balancer does not depend on any of the newer 3.0 features.

> , if i need to rebalance the disks, do i really need to decommission it or just stop the Datanode and start after a while...??

Stopping datanodes does not change the disk usage, so if one disk is over utilized, some writes will fail, that is when the datanode picks that disk(it generally uses a round-robin allocation scheme). So you need to make sure data is similarly distributed in each of the disks. That is what diskBalancer does for you, it computes how much to move based on each disk type ( Disk, SSD etc). When you decommission a node all the blocks of this machine is moved to other nodes. Then you can go into this node and make sure all the data disks are empty and add the node back. Then you will need to run balancer -- not disk balancer -- to move data back to this node. Sorry it is so painful.

avatar
Expert Contributor

Hi, thanks so much for the information. But for now I deleted the hdfs /tmp folder contents which were lying there for a long time. This freed up about 500GB of space on hdfs in total, and that particular disk went down to 82% from 90%. How is it possible? The other disk also which had the same issue went down to 82%. My question is: did the disk usage go down just because of deleting the /tmp folder or does the disk size fluctuate because of some other running jobs too??

Also, I thought mapreduce utilizes local disk for storing intermediate data, so what is actually stored in hdfs tmp directory? I assumed this is where the intermediate data is stored, which is utilizing hdfs space.

Thanks again in advance.