Support Questions

Find answers, ask questions, and share your expertise

What is the recommended value for dfs.datanode.handler.count in a large cluster?

avatar
Guru

Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.

1 ACCEPTED SOLUTION

avatar

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

View solution in original post

2 REPLIES 2

avatar

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

avatar
Expert Contributor

As @bsaini explained this property determines no of open handlers at given time for a datanode.

Two factors which one can look before changing this property are:

1. Use Cases

2. HDP services being used

For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.