Created 05-30-2022 08:36 AM
Hi,
after having deleted tera bytes of data from HDFS (1/4 of the total capacity), the block count among data nodes did not decrease as expected. It is still over the critical threshold.
How could it be solved?
Thank you
Created 06-08-2022 01:04 AM
Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files.
600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.
Created 06-06-2022 05:07 AM
Hi,
I'm still analyzing the output: the command "fsck" on the path where deleting operations have been made reports just 1 block.
Looking at the attached chart, you can see that on May, the 19th, a lot of data was removed from hdfs (60TB), and the number of blocks decreased for a single datanode (bda1node02).
600.000 blocks (1 block -> 256MB).
In the other datanodes, blocks remained the same (or increased slightly).
Created 06-08-2022 01:04 AM
Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files.
600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.
Created 06-08-2022 03:50 AM
Hi @mszurap ,
I agree with you about these numbers.
Even if 60-100TB is a high amount of data, the total number of blocks involved is not so high (next to 600k), if compared to each Datanode.
Each datanode reports 9M of blocks, but we found the problem is related to other directories that cointain small files, where block size is about 2-3MB. Even if the total size of these directory is not so high, we expect the number of block will decrease more significantly.
We are facing the problem of small files, which determines a high number of blocks. The directory we have deleted had larger blocks, which is why the decrease in blocks was imperceptible.
Thank you for the support in the analysis!
Created 06-08-2022 04:08 AM
Hi Andrea,
Great to see that it has been found now and thanks for marking the post as answered.
All the best, Miklos