Support Questions
Find answers, ask questions, and share your expertise

Datanodes report block count more than threshold

Datanodes report block count more than threshold

Explorer

Hello,

 

In few of our clusters, datanodes report block counts more than the threshold. As of now we have checked that there is data distributed evenly across datanodes, there are not corrupt blocks. Is it because of too many small files equivalent block size(perhaps parquet format). Please suggest what could be the reason or how about we go ahead to find one.

 

 

Thanks & Regards

Pravdeep

1 REPLY 1
Highlighted

Re: Datanodes report block count more than threshold

Master Guru
Yes, the block count alert serves to indicate an early warning to a growing
number of small files issue. While your DN can handle a lot of blocks in
general, going too high will cause perf. issues. Small files will also
cause a lot of processing overheads and slow down workloads generally.

If you use CM Enterprise, you can use the Reports feature to find small
files and who/where its top culprits are:
http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_disk_usage_reports.html
Otherwise, a script to analyse a single hadoop fs -ls -R / output may also
suffice.