Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Datanode_block_count on HDFS

Datanode_block_count on HDFS

Explorer

We are getting block count alerts from datanodes.

 

The DataNode has 1,283,487 blocks. Warning threshold: 500,000 block(s). 

 

Replication Factor: 3

Block size : 128MB

Datanodes: 5

Racks: 1

NN Heap size: 4 GB

 

 

 

2227152 files and directories, 2177187 blocks = 4404339 total filesystem object(s).

Heap Memory used 2.38 GB of 3.87 GB Heap Memory. Max Heap Memory is 3.87 GB.

 

2. How to merge small files using HAR on existing Hadoop cluster? How to identify the small files?

 

Please suggest the solution.

1 REPLY 1
Highlighted

Re: Datanode_block_count on HDFS

Expert Contributor

Identify small files can be done with the command line options of hdfs dfs -du <path>  here is an admin commands https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html

 

using hadoop archive to archive old, small files is a good solution, here is the reference for that: 

https://hadoop.apache.org/docs/r2.7.1/hadoop-archives/HadoopArchives.html

 

You can also increase the amount of heap for the datanodes to increase the amount of blocks it can safely serve. I've personally seen no issues with 8GB heaps. I haven't run anything past that so I can't say just how far you can go with that, and other factors like the amount of requests made to the cluster also matter for the amount of memory used. 

Don't have an account?
Coming from Hortonworks? Activate your account here