Support Questions

andrea_pretotto · ‎05-30-2022

Hi,

after having deleted tera bytes of data from HDFS (1/4 of the total capacity), the block count among data nodes did not decrease as expected. It is still over the critical threshold.

How could it be solved?

Thank you

mszurap · ‎06-08-2022

Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files.

600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.

View solution in original post

andrea_pretotto · ‎06-06-2022

Hi,

I'm still analyzing the output: the command "fsck" on the path where deleting operations have been made reports just 1 block.

Looking at the attached chart, you can see that on May, the 19th, a lot of data was removed from hdfs (60TB), and the number of blocks decreased for a single datanode (bda1node02).

600.000 blocks (1 block -> 256MB).
In the other datanodes, blocks remained the same (or increased slightly).

mszurap · ‎06-08-2022

Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files.

600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.

andrea_pretotto · ‎06-08-2022

Hi @mszurap ,

I agree with you about these numbers.

Even if 60-100TB is a high amount of data, the total number of blocks involved is not so high (next to 600k), if compared to each Datanode.

Each datanode reports 9M of blocks, but we found the problem is related to other directories that cointain small files, where block size is about 2-3MB. Even if the total size of these directory is not so high, we expect the number of block will decrease more significantly.

We are facing the problem of small files, which determines a high number of blocks. The directory we have deleted had larger blocks, which is why the decrease in blocks was imperceptible.

Thank you for the support in the analysis!

mszurap · ‎06-08-2022

Hi Andrea,

Great to see that it has been found now and thanks for marking the post as answered.

All the best, Miklos

Cloudera Community

Support Questions

HDFS block count does not decrease after deleting data