Created 04-21-2017 04:42 PM
Hi,
I have 3 node cluster running on CentOS 6.7. Since a week I can see warning on all 3 nodes block count more than threshold. My Namenode is also used as DataNode.
Its more or less like this on all 3 nodes.
Concerning : The DataNode has 1,823,093 blocks. Warning threshold: 500,000 block(s).
I know this means the problem of growing small files. I have website data (unstructured) on hdfs, they contain jpg, mpeg, css, js, xml, html types of data.
I dont know how to deal with this problem. Please help.
The output of the following command on NameNode is:
[hdfs@XXXXNode01 ~]$ hadoop fs -ls -R / |wc -l
3925529
Thanks,
Shilpa
Created 05-03-2017 01:28 PM
Created 05-11-2017 11:58 AM
What all I did:
1. Increased the memory of NN
2. Increased he disk of overall cluster
3. Increased dfs blocksize from 64MB to 128MB
4. Increased the block count threshold.
Created 04-30-2017 07:19 AM
if you have Cloudera manager , you could easily find the problem as to which job is creating lot of stress on the storage . Please take a peek in to the below link
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_disk_usage_reports.html
Created 12-06-2018 01:44 PM
Hi All,
I recoment to check which application team is causing it by using #hdfs dfs -count -v -h /project/*
If the FILE_COUNT is more than 10M, then its problem for mid size of cluster.
Please check the below link to reduce the block count.
Reg,
Sandeep Kolli
Created 12-06-2018 01:50 PM
To add,
For sizing a datanode heap it's similar to namenode heap, its recommend 1GB per 1M blocks. As a block could be as small a 1byte or as large as 128MB, the requirement of heap space is the same.