Created on 03-23-2021 09:17 PM - edited 09-16-2022 07:41 AM
Hello,
We are getting alerts for Block Count on one of our data nodes as it has crossed the threshold of 10000. Since HDFS balancer did not fix the issue, the next thing I turned my focus to see if we are hitting small files issue. I was trying to put up a report via terminal script ( hdfs dfs -ls -R /tmp |grep ^- |awk '{if ($5 < 134217728) print $5, $8;}'| head -5 | column –t) but when I compare the result from the script output vs HDFS Report from Cloudera Manager I see a difference in the size of the same file.
Could anyone provide any guidance / assistance on this, or am I doing something wrong.
Thanks
Amn
Created 03-30-2021 04:23 AM
Hello @Amn_468 Please note that, you get the block count alert after hitting the warning/critical threshold value set in HDFS Configuration. It is a Monitoring alert and doesn't impact any HDFS operations as such.
You may increase the monitoring threshold value in CM ( CM > HDFS > Configurations > DataNode Block Count Thresholds)
However, CM monitors the block counts on the DataNodes is to ensure you are not writing many small files into HDFS. Increase in block counts on DNs is an early warning of small files accumulation in HDFS. The simplest way to check if you are hitting small files issue is to check the average block size of HDFS files.
Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks.
[..]
$ hdfs fsck /
..
...
Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<<
[..]
Similarly, you can get the average file size in HDFS by running a script as follows:
$hdfs dfs -ls -R / | grep -v "^d" |awk '{OFMT="%f"; sum+=$5} END {print "AVG File Size =",sum/NR/1024/1024 " MB"}'
The file size reported by Reports Manager under "HDFS Reports" in Cloudera Manager can be different as the report is extracted from >1hour old FSImage (not a latest one).
Hope this helps. Any question further, feel free to update the thread. Else mark solved.
Regards,
Pabitra Das
Created 03-30-2021 01:41 AM
Hi @Amn_468 I see you have received alert for Block Count on one of the data nodes as it has crossed the threshold of 10000 blocks. So this is basically controlled by Property "DataNode Block Count Thresholds" which alert you if any of the DataNode cross the specified number of blocks and so you can check and take necessary action whether to reduce the number of blocks by deleting unwanted files or the other option is to increase this threshold as with time cluster grows and the amount of data also grows with it. If you find all the blocks are legit and need to kept in the cluster you can simply increase the threshold this does not require any service restart.
- Let me know if any further query or comment.
Created 03-30-2021 04:23 AM
Hello @Amn_468 Please note that, you get the block count alert after hitting the warning/critical threshold value set in HDFS Configuration. It is a Monitoring alert and doesn't impact any HDFS operations as such.
You may increase the monitoring threshold value in CM ( CM > HDFS > Configurations > DataNode Block Count Thresholds)
However, CM monitors the block counts on the DataNodes is to ensure you are not writing many small files into HDFS. Increase in block counts on DNs is an early warning of small files accumulation in HDFS. The simplest way to check if you are hitting small files issue is to check the average block size of HDFS files.
Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks.
[..]
$ hdfs fsck /
..
...
Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<<
[..]
Similarly, you can get the average file size in HDFS by running a script as follows:
$hdfs dfs -ls -R / | grep -v "^d" |awk '{OFMT="%f"; sum+=$5} END {print "AVG File Size =",sum/NR/1024/1024 " MB"}'
The file size reported by Reports Manager under "HDFS Reports" in Cloudera Manager can be different as the report is extracted from >1hour old FSImage (not a latest one).
Hope this helps. Any question further, feel free to update the thread. Else mark solved.
Regards,
Pabitra Das