HDP 2.6.3.0-235
One of my hadoop data directory is full on all the cluster instance (same drive all the time ) 100% usage .
I have deleted almost all data in hdfs with skiptrash + expunge. I even try to reboot all boxes but still the directory is full on all cluster member
When i dive into the directory structure i can see that it is the hdfs blockpool area .
>hdfs dfs -du /
45641 /app-logs
247478401 /apps
92202 /ats
950726849 /hdp
0 /livy-recovery
0 /livy2-recovery
0 /mapred0 /mr-history
0 /project
5922 /spark-history
0 /spark2-history
2 /system
98729320 /tmp
981081678 /user
0 /webhdfs
>hdfs dfs -df /
Filesystem Size Used Available Use%
hdfs://X:8020 412794792448 186773504950 149000060339 45%
====
if i go down into the data directory i end up finding blockpool file that are not known when you try to fsck them by blockId will others are.
>cd /hadoop/hdfs/data/current/BP-1356934633-X.X.X.X-1513618933915/current/finalized/subdir0/subdir150/
>ls
blk_1073780387 blk_1073780392 blk_1073780395 blk_1073780463 blk_1073780475
blk_1073780387_39569.meta blk_1073780392_39574.meta blk_1073780395_39577.meta blk_1073780463_39645.meta blk_1073780475_39657.meta >hdfs fsck -locations -files -blockId blk_1073780463 Connecting to namenode via http://X.X.X.X:50070/fsck?ugi=hdfs&locations=1&files=1&blockId=blk_1073780463+&path=%2F
FSCK started by hdfs (auth:X) from /X.X.X.X at Mon Jan 22 14:30:02 GMT 2018
Block blk_1073780463 does not exist >
=====
Anyone ever seen something like that, sound the file is deleted in namenode but not on the file system, is their a command to run to check that integrity and or can i delete any blk_nnnnn file if not known when doing fsck ?
Thanks in advance for your help