Support Questions

yves_name · ‎01-22-2018

HDP 2.6.3.0-235

One of my hadoop data directory is full on all the cluster instance (same drive all the time ) 100% usage .

I have deleted almost all data in hdfs with skiptrash + expunge. I even try to reboot all boxes but still the directory is full on all cluster member

When i dive into the directory structure i can see that it is the hdfs blockpool area .

>hdfs dfs -du /

45641 /app-logs 

247478401 /apps

92202 /ats

950726849 /hdp

0 /livy-recovery

0 /livy2-recovery

0 /mapred0 /mr-history

0 /project

5922 /spark-history

0 /spark2-history

2 /system

98729320 /tmp

981081678 /user

0 /webhdfs

>hdfs dfs -df /
Filesystem                             Size          Used     Available  Use%
hdfs://X:8020  412794792448  186773504950  149000060339   45%

====

if i go down into the data directory i end up finding blockpool file that are not known when you try to fsck them by blockId will others are.

>cd /hadoop/hdfs/data/current/BP-1356934633-X.X.X.X-1513618933915/current/finalized/subdir0/subdir150/ 

>ls 

blk_1073780387             blk_1073780392             blk_1073780395             blk_1073780463             blk_1073780475
blk_1073780387_39569.meta  blk_1073780392_39574.meta  blk_1073780395_39577.meta  blk_1073780463_39645.meta  blk_1073780475_39657.meta >hdfs fsck -locations -files -blockId  blk_1073780463 Connecting to namenode via http://X.X.X.X:50070/fsck?ugi=hdfs&locations=1&files=1&blockId=blk_1073780463+&path=%2F
FSCK started by hdfs (auth:X) from /X.X.X.X at Mon Jan 22 14:30:02 GMT 2018
Block blk_1073780463 does not exist >

=====

Anyone ever seen something like that, sound the file is deleted in namenode but not on the file system, is their a command to run to check that integrity and or can i delete any blk_nnnnn file if not known when doing fsck ?

Thanks in advance for your help

yves_name · ‎01-23-2018

In case useful for others .

The hdfs get at some stage corrupted. i made an fsck -delete, but ended up in a instable situation . All the given directory get totally full on all the node . This is related to the block scanner, which is a facility to scan all block and do necessary verification .

This only occur every 3 weeks by default due to the intensity of disk scan and IO.

So to claim back those blockpool you have to trigger the Block Scanner, which is not possible through command line .

One option can be set dfs.datanode.scan.period.hours to 1 .

You may also consider to delete the scanner.cursor files rm -rf `locate scanner.cursor` then restart the datanode .

http://hadoopinrealworld.com/datanode-block-scanner/ https://community.hortonworks.com/questions/6931/in-hdfs-why-corrupted-blocks-happens.html

https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/

Cloudera Community

Support Questions

Local File system full, due to hadoop data directory not cleaning the deleted blockpool