Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hdfs: max utilization on a single disk

avatar
Contributor

Hi! Our cluster is filling up, around 80% of the total space is used. This causes several disks to be overutilized - they passed critical threshold of 95% usage. We are running rebalancer but results are little and rather delayed. I wonder if there is any threshold above which no data is written to disks. Or maybe Data Node will write data until it takes 100% of available disk space? Thank you in advance!

1 ACCEPTED SOLUTION

avatar
Mentor
To be precise, the issues will appear on the DataNode due to parallel use
of the disks by NodeManager and other daemons sharing the host (and disk
mount paths).

The NameNode by itself keeps track of how much space the DataNode has and
avoids full DNs if they cannot accommodate an entire block size, and a host
of other checks (such as load average, recency of heartbeats, etc.):
https://github.com/cloudera/hadoop-common/blob/cdh5.14.0-release/hadoop-hdfs-project/hadoop-hdfs/src...
and
https://github.com/cloudera/hadoop-common/blob/cdh5.14.0-release/hadoop-hdfs-project/hadoop-hdfs/src...

Additionally the DataNodes "hide" the configured reserved space that HDFS
never considers available (and hits its "deselect" DN criteria well before
the disk fills up).

However, keep in mind that you may be running YARN's NodeManagers on the
same set of disks (on different directories). These carry their own usage
and selection policies. A rogue app can cause a disk to get temporarily
filled up very quickly, compared to your regular HDFS write rates. This can
cause the DNs to suddenly find themselves lacking space in middle of a
write, but will probably appear OK later. You'll ideally need to ensure
YARN too marks its NodeManagers as 'unhealthy' and not assign more tasks
when such things happen - this can be done with the NodeManager health
check script features.

All this said, HDFS clients will optimistically retry their work on the
same DN or on another DN (or even remaining DNs in the immediate replicated
pipeline) should they run into a space caused issue, and will try not to
let the writing application fail unless its a very extreme scenario.

View solution in original post

8 REPLIES 8

avatar
Master Collaborator

Hi @lizard

By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. You can configure a volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.
source: https://www.cloudera.com/documentation/enterprise/latest/topics/admin_dn_storage_balancing.html

NB: Do you remove all the HDFS Trash files in paths (/user/impala/Trash/*, (/user/hdfs/Trash/*...).

Good luck man.

avatar
Contributor

Hi Lizard,

 

 

in the linked documentation you could have find good data, however in case of immediate need to rebalance disks inside the DataNode you can as well run a disk balancer (note that this is different from the HDFS Balancer).

 

Disk balancer info are here:

https://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apach...

 

Cheers,

Pifta

avatar
Explorer

avatar
Contributor

Hi!

 

Many thanks to all for your answers. We managed to delete some data that turned out not to be needed and mitigated the problem this way. Yes, we use Storage Balancing on DN and also have 'disk balancer' enabled (fortunately didn't run into problems described by @koc) I was just wondering if there is any rule that would 'turn off' writing on Data Node if it exceeds the threshold, eg. there is lest than 100GB left on all its disks together. Didn't find anything like this in hdfs-default.xml file. So I wonder what would happen if we let writing data until 100% of HDFS space is used on some nodes - would it really reach 100% or start to prefer other nodes over the filled up one. This would probably interfere with replica placement based on the topology of the cluster, so I can imagine there is no mechanism like this. Let me know what you think about it.

 

Thanks!

avatar
Contributor

Hi @lizard,

 

if HDFS DataNode reaches max capacity on a disk, it will not use it, as the allocation of a new block is checking the available space on the disk.

 

This check is considering the dfs.du.reserve setting as well, so if you reserve for example 10GB of space, and a disk has less the 10GB+blocksize free space, a block allocation will not happen on the disk.

 

If a DataNode is completely full, and there are no further disks where at least one block can be allocated, that can cause block allocation issues on the HDFS level. Also if no disk space is available, that can result in issues on the DataNode level during internal DataNode operations, that is why we suggest to size a cluster in a way that you have about 25% free space available as a good minimum.

 

Cheers,

Pifta

avatar
Contributor

Hi @pifta,

 

Many thanks for your explanations. So the conclusion is: the disks would get filled up even after they reached critical level of let's say 90% utilization and only when there is barely no space left new blocks wouldn't be assigned to them. However all the time nodes with less data would be preferred for the sake of maintaining balanced cluster. So thereotically it's possible to drive the cluster to the point of 100% utilization where operations are not possible for reasons you mentioned in your post. Good to know, thank you for sharing your knowledge.

avatar
Mentor
To be precise, the issues will appear on the DataNode due to parallel use
of the disks by NodeManager and other daemons sharing the host (and disk
mount paths).

The NameNode by itself keeps track of how much space the DataNode has and
avoids full DNs if they cannot accommodate an entire block size, and a host
of other checks (such as load average, recency of heartbeats, etc.):
https://github.com/cloudera/hadoop-common/blob/cdh5.14.0-release/hadoop-hdfs-project/hadoop-hdfs/src...
and
https://github.com/cloudera/hadoop-common/blob/cdh5.14.0-release/hadoop-hdfs-project/hadoop-hdfs/src...

Additionally the DataNodes "hide" the configured reserved space that HDFS
never considers available (and hits its "deselect" DN criteria well before
the disk fills up).

However, keep in mind that you may be running YARN's NodeManagers on the
same set of disks (on different directories). These carry their own usage
and selection policies. A rogue app can cause a disk to get temporarily
filled up very quickly, compared to your regular HDFS write rates. This can
cause the DNs to suddenly find themselves lacking space in middle of a
write, but will probably appear OK later. You'll ideally need to ensure
YARN too marks its NodeManagers as 'unhealthy' and not assign more tasks
when such things happen - this can be done with the NodeManager health
check script features.

All this said, HDFS clients will optimistically retry their work on the
same DN or on another DN (or even remaining DNs in the immediate replicated
pipeline) should they run into a space caused issue, and will try not to
let the writing application fail unless its a very extreme scenario.

avatar
Contributor
Hi @Harsh J, thank you for even more thorough answer, placement policy is clear now. I didn't see the risk of rogue Yarn apps before, it's very helpful. Many thanks!