- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is the recommended value for dfs.datanode.handler.count in a large cluster?
- Labels:
-
Apache Hadoop
Created ‎12-03-2015 06:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Default for this is 10. I have seen it at 128 in a large (over 1000 nodes) cluster and I think this is causing load issues. What is the recommended value for this and when should this be increased from default 10.
Created ‎12-03-2015 09:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.
So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.
If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.
Created ‎12-03-2015 09:13 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.
So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.
If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.
Created ‎12-05-2015 08:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As @bsaini explained this property determines no of open handlers at given time for a datanode.
Two factors which one can look before changing this property are:
1. Use Cases
2. HDP services being used
For example if you are using HBase extensively then increasing this property(to match cores or spindles in datanode) may help in getting better throughput specially for bulk writes/reads. However increasing it beyond a point will not help or even may effect performance negatively.
