We are getting block count alerts from datanodes.
The DataNode has 1,283,487 blocks. Warning threshold: 500,000 block(s).
Replication Factor: 3
Block size : 128MB
NN Heap size: 4 GB
2227152 files and directories, 2177187 blocks = 4404339 total filesystem object(s).
Heap Memory used 2.38 GB of 3.87 GB Heap Memory. Max Heap Memory is 3.87 GB.
2. How to merge small files using HAR on existing Hadoop cluster? How to identify the small files?
Please suggest the solution.
Identify small files can be done with the command line options of hdfs dfs -du <path> here is an admin commands https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html
using hadoop archive to archive old, small files is a good solution, here is the reference for that:
You can also increase the amount of heap for the datanodes to increase the amount of blocks it can safely serve. I've personally seen no issues with 8GB heaps. I haven't run anything past that so I can't say just how far you can go with that, and other factors like the amount of requests made to the cluster also matter for the amount of memory used.