Member since
02-17-2016
20
Posts
2
Kudos Received
0
Solutions
07-22-2020
05:45 AM
The given solution is certainly not true unfortunately. In HDFS a given block if it is open for write, then consumes 128MB that is true, but as soon as the file is closed, the last block of the file is counted just by the length of the file. So if you have a 1KB file, that consumes 3KB disk space considering replication factor 3, and if you have a 129MB file that consumes 387MB disk space again with replication factor 3. The phenomenon that can be seen in the output was most likely caused by other non-DFS disk usage, that made the available disk space for HDFS less, and had nothing to do with the file sizes. Just to demonstrate this with a 1KB test file: # hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://<nn>:8020 27.1 T 120 K 27.1 T 0% # fallocate -l 1024 test.txt # hdfs dfs -put test.txt /tmp # hdfs dfs -df -h Filesystem Size Used Available Use% hdfs://<nn>:8020 27.1 T 123.0 K 27.1 T 0% I hope this helps to clarify and correct this answer.
... View more
03-12-2018
05:35 AM
1 Kudo
Hi @lizard, if HDFS DataNode reaches max capacity on a disk, it will not use it, as the allocation of a new block is checking the available space on the disk. This check is considering the dfs.du.reserve setting as well, so if you reserve for example 10GB of space, and a disk has less the 10GB+blocksize free space, a block allocation will not happen on the disk. If a DataNode is completely full, and there are no further disks where at least one block can be allocated, that can cause block allocation issues on the HDFS level. Also if no disk space is available, that can result in issues on the DataNode level during internal DataNode operations, that is why we suggest to size a cluster in a way that you have about 25% free space available as a good minimum. Cheers, Pifta
... View more
03-07-2018
07:55 AM
1 Kudo
Hi Koc, this seems to be pretty much interesting, as the exception based on the code seems to be a result of a race condition, as the getDiskBalancerStatus call is the following in the code: @Override // DataNodeMXBean public String getDiskBalancerStatus() { try { return this.diskBalancer.queryWorkStatus().toJsonString(); } catch (IOException ex) { LOG.debug("Reading diskbalancer Status failed. ex:{}", ex); return ""; } } So that NullPointerException can happen when the diskBalancer is null, or if queryWorkStatus() returns a null. queryWorkStatus() throws an IOException when the disk Balancer is not enabled, and that is why disabling the disk balancer fixes the issue. Otherwise queryWorkStatus seems to always return a reference. This is why I suspect a race condition that causes the diskBalancer reference to be null in the DataNode object when the getDiskBalancerStatus method is called. As the getDiskBalancerStatus method is exposed to the JMX interface, this method is called, when the DataNode's JMX interface is being queried, and should not prevent the DataNode startup. So this seems to be something that should not fail the DataNode startup, do you still have the startup logs for this issue, when the DataNode startup failed? Is there anything else that is reported as an error, or fatal? If you do have the DataNode standard error output (on a CDH cluster it is in /var/run/cloudera-scm-agent/process/xxx-DATANODE/logs folders) for a failed start, then that might as well contain some other traces about the problem, would you please check it, it would be nice to track this down, and if it is a bug fix it. Thanks! Istvan
... View more
03-07-2018
02:23 AM
Hi Lizard, in the linked documentation you could have find good data, however in case of immediate need to rebalance disks inside the DataNode you can as well run a disk balancer (note that this is different from the HDFS Balancer). Disk balancer info are here: https://blog.cloudera.com/blog/2016/10/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ Cheers, Pifta
... View more