if i go down into the data directory i end up finding blockpool file that are not known when you try to fsck them by blockId will others are.
blk_1073780387 blk_1073780392 blk_1073780395 blk_1073780463 blk_1073780475
blk_1073780387_39569.meta blk_1073780392_39574.meta blk_1073780395_39577.meta blk_1073780463_39645.meta blk_1073780475_39657.meta >hdfs fsck -locations -files -blockId blk_1073780463 Connecting to namenode via http://X.X.X.X:50070/fsck?ugi=hdfs&locations=1&files=1&blockId=blk_1073780463+&path=%2F
FSCK started by hdfs (auth:X) from /X.X.X.X at Mon Jan 22 14:30:02 GMT 2018
Block blk_1073780463 does not exist >
Anyone ever seen something like that, sound the file is deleted in namenode but not on the file system, is their a command to run to check that integrity and or can i delete any blk_nnnnn file if not known when doing fsck ?
The hdfs get at some stage corrupted. i made an fsck -delete, but ended up in a instable situation .
All the given directory get totally full on all the node .
This is related to the block scanner, which is a facility to scan all block and do necessary verification .
This only occur every 3 weeks by default due to the intensity of disk scan and IO.
So to claim back those blockpool you have to trigger the Block Scanner, which is not possible through command line .
One option can be set
dfs.datanode.scan.period.hours to 1 .
You may also consider to delete the scanner.cursor files rm -rf `locate scanner.cursor` then restart the datanode .