Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DATA_NODE_BLOCK_COUNT threshold 200,00 block(s)?

avatar
New Contributor

Using CM 4.8.2 & CDH 4.6.0,
All dataNode(3) health concern I got this warning message.
"The health test result for DATA_NODE_BLOCK_COUNT has become concerning: The DataNode has 200,045 blocks. Warning threshold: 200,000 block(s)."
To solve this problem, can I just increase the limit to 300,000 block(s)?
any reasons threshold value is 200,000 block(s)?

1 ACCEPTED SOLUTION

avatar
Mentor
Yes, the reason of the 200k default is to warn you that you may be facing a small files issue in your cluster, or that you may be close to requiring to expand further horizontally.

Having more number of blocks raises the heap requirement at the DataNodes. The threshold warning exists to also notify you about this (that you may need to soon raise the DN heap size to allow it to continue serving blocks at the same performance).

With CM5 we have revised the number to 600k, given memory optimisation improvements for DNs in CDH4.6+ and CDH5.0+. You can feel free to raise the threshold via the CM -> HDFS -> Configuration -> Monitoring section fields, but do look into if your users have begun creating too many tiny files as it may hamper their job performance with overheads of too many blocks (and thereby, too many mappers).

View solution in original post

4 REPLIES 4

avatar
Mentor
Yes, the reason of the 200k default is to warn you that you may be facing a small files issue in your cluster, or that you may be close to requiring to expand further horizontally.

Having more number of blocks raises the heap requirement at the DataNodes. The threshold warning exists to also notify you about this (that you may need to soon raise the DN heap size to allow it to continue serving blocks at the same performance).

With CM5 we have revised the number to 600k, given memory optimisation improvements for DNs in CDH4.6+ and CDH5.0+. You can feel free to raise the threshold via the CM -> HDFS -> Configuration -> Monitoring section fields, but do look into if your users have begun creating too many tiny files as it may hamper their job performance with overheads of too many blocks (and thereby, too many mappers).

avatar
New Contributor

Thanks for your response.

I deleted useless HDFS files(3TB) yesterday(hadoop fs -rm -r), but warning messege is still continuous.

DATA_NODE_BLOCK_COUNT is same before deleting files. (current value is 921,891 blocks)

How can I reduce current DATA_NODE_BLOCK_COUNT?

avatar
Even after a file is deleted, the blocks will remain if HDFS Trash is enabled. Do you have Trash enabled? It configured as stated in this URL:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-latest/Cloudera-Manage...

Regards,
Gautam Gopalakrishnan

avatar
Rising Star

Harsh,

 

In this thread you stated "but do look into if your users have begun creating too many tiny files as it may hamper their job performance with overheads of too many blocks (and thereby, too many mappers)." Too may tiny files is in the eye of the  beholder if those files are what get you paid. 

I'm also seeing a block issue on wo of our nodes, but a rebalance to 10% has no effect. I've rebalanced to 8% and it improves, but I suspect we're running into a small files issue.