In few of our clusters, datanodes report block counts more than the threshold. As of now we have checked that there is data distributed evenly across datanodes, there are not corrupt blocks. Is it because of too many small files equivalent block size(perhaps parquet format). Please suggest what could be the reason or how about we go ahead to find one.
Yes, the block count alert serves to indicate an early warning to a growing number of small files issue. While your DN can handle a lot of blocks in general, going too high will cause perf. issues. Small files will also cause a lot of processing overheads and slow down workloads generally.