Created 12-03-2015 06:56 PM
Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.
Created 12-03-2015 09:13 PM
Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.
So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.
If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.
Created 12-03-2015 09:13 PM
Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.
So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.
If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.
Created 12-05-2015 08:43 PM
As @bsaini explained this property determines no of open handlers at given time for a datanode.
Two factors which one can look before changing this property are:
1. Use Cases
2. HDP services being used
For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.