Support Questions

ravi1 · ‎12-03-2015

Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.

bsaini · ‎12-03-2015

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

View solution in original post

bsaini · ‎12-03-2015

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

ajay_kumar · ‎12-05-2015

As @bsaini explained this property determines no of open handlers at given time for a datanode.

Two factors which one can look before changing this property are:

1. Use Cases

2. HDP services being used

For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.

Cloudera Community

Support Questions

What is the recommended value for dfs.datanode.handler.count in a large cluster?

What is the recommended value of file descriptors ...

Swappiness setting recommendation

Optimize Ambari Performance for Large Clusters

Recommended value for vm.overcommit_memory for a S...

Identify where most of the small file are located ...

Nifi attribute containing large text value

ConvertExcelToCSVProcessor - File too Large

what's the recommended value of mapreduce.job.max....

Hbase is giving an error KeyValue size too large w...

Querying large datasets in Cloudera - Emmanuel Kat...