Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the recommended value for dfs.datanode.handler.count in a large cluster?

Solved Go to solution

What is the recommended value for dfs.datanode.handler.count in a large cluster?

Guru

Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What is the recommended value for dfs.datanode.handler.count in a large cluster?

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

2 REPLIES 2

Re: What is the recommended value for dfs.datanode.handler.count in a large cluster?

Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.

So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.

If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.

Highlighted

Re: What is the recommended value for dfs.datanode.handler.count in a large cluster?

Rising Star

As @bsaini explained this property determines no of open handlers at given time for a datanode.

Two factors which one can look before changing this property are:

1. Use Cases

2. HDP services being used

For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.

Don't have an account?
Coming from Hortonworks? Activate your account here