Created 05-30-2022 08:36 AM
Hi,
after having deleted tera bytes of data from HDFS (1/4 of the total capacity), the block count among data nodes did not decrease as expected. It is still over the critical threshold.
How could it be solved?
Thank you
Created 06-08-2022 01:04 AM
Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files.
600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.
Created 05-30-2022 10:37 AM
Hello @andrea_pretotto ,
This typically happens if you have snapshots on the system. Even though the "current" files are deleted from HDFS, they may be still hold by one ore more snapshots (which are exactly useful against accidental data deletions, as you can recover data from the snapshots if needed).
Please check which HDFS directories are snapshottable:
hdfs lsSnapshottableDir
and then check how many snapshots do you have under those directories:
hdfs dfs -ls /snapshottable_path/.snapshot
Probably you can also verify it with checking the output of the "du" which includes the snapshots' sizes:
hdfs dfs -du -h -v -s /snapshottable_path
vs the same which excludes the snapshots from the calculation:
hdfs dfs -du -x -h -v -s /snapshottable_path
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html#du
Best regards
Miklos
Customer Operations Engineer, Cloudera
Created 05-31-2022 12:38 AM
Hi Miklos,
thank you for the detailed answer.
I found that the parent of the directory I removed has snapshots enabled, but there are no snapshots.
The command:
hdfs dfs -du -x -h -v -s /snapshottable_path
returns no lines.
Also the output of "du" is the same.
Should I disable snapshots on the parent directory? Are there other configuration I should apply?
Thank you again.
Created 05-31-2022 01:08 AM
Hi, the "hdfs dfs -du" for that path should return the summary of the disk usage (bytes, kbytes, megabytes, etc..) for that given path. Are you sure there are "no lines returned"? Have you checked the "du" output for a smaller subpath (which has less files underneith), does that return results?
Can you also clarify where have you checked the block count before and after the deletion? ("the block count among data nodes did not decrease as expected")
Created 05-31-2022 02:09 AM
Hi Miklos,
sorry for the typo.. I have executed the command
hdfs dfs -ls /snapshottable_path/.snapshot
and got no lines on the directory.
The "du" commands ("du -x -h" and "du -h") report the same size.
When I click on the block count alerts on the HDFS service, I can see the number of blocks, which does not decrease.
The DataNode has 8,743,931 blocks. Critical threshold: 8,000,000 block(s).
Thank you again.
Created 05-31-2022 07:33 AM
Hi Andrea,
Oh, I see, I did not consider that you see this from the DataNodes' perspective. Was this cluster recently upgraded? Is the "Finalize upgrade" step for HDFS is still pending?
While HDFS upgrade is not finalized, DataNodes keep track of all the previous blocks (including blocks deleted after the upgrade) in case a "rollback" is needed.
Created 05-31-2022 01:00 PM
Did you use the -skipTrash option during the deletion?
Created 06-01-2022 03:04 AM
Created 06-01-2022 03:21 AM
DN should keep files only which are still managed and known by NN. After a huge deletion event of course these "pending deletes" may take some time to be sent to DNs (and the DNs to delete them), but usually that's not that long. Maybe check the "select pending_deletion_blocks" chart if this is applicable.
So if the above are not applicable, then check it more deeply with:
- collect a full hdfs fsck -files -blocks -locations output
- pick a DN which you think has more blocks than it should
- verify how many blocks are reported by the hdfs fsck report for that DN
- verify on DN side how many files is it storing - are those numbers matching?
Created 06-05-2022 11:11 PM
@andrea_pretotto, Has the reply helped resolve your issue? If so, can you please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future?
Regards,
Vidya Sargur,