Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Enable HDFS Storage-Balancer and Role Restart

avatar
Explorer

Hello, 

 

I am planning to enable the HDFS Storage-Balancer as per this article: https://www.cloudera.com/documentation/enterprise/5-5-x/topics/admin_dn_storage_balancing.html. I plan to use the defaults, but change the "dfs.datanode. available-space- volume-choosing- policy.balanced- space-preference- fraction" to 1.0 (understanding that this could result in some bottlenecks. 

 

My questions about this process are: 

 

1) We are also running HBase in this cluster, with RegionServers colocated with the Datanodes. Are there any gotchas that should be considered given this? 

 

2) We don't have CM Enterprise, so the official rolling restart is not available to us. Is it possible to restart the Datanode role on nodes individually? 

 

Thank you. 

1 ACCEPTED SOLUTION

avatar
Mentor
> 1) We are also running HBase in this cluster, with RegionServers colocated with the Datanodes. Are there any gotchas that should be considered given this?

The DN writes the blocks in typically round-robin'd manner across the disk list, but in your configuration if a disk is found to match the threshold it will select that disk over and over until the threshold gets reached.

If the threshold is very large (many thousand full blocks required to reach the divide) then the RS performance can likely suffer a bit when its trying to flush, replay or compact in parallel. This would go away when the disk rotation falls back to round robin due to no volume being in violation of the space threshold.

If you're OK in observing some small slowness (assuming a small difference threshold, and disks not being cleaned and reinserted too often) then you should be good. If however your HBase usage is very latency bound, then consider using a smaller preference fraction so it does not focus on pumping all work onto a single or specific set of disks when the threshold is found to be crossed.

> 2) We don't have CM Enterprise, so the official rolling restart is not available to us. Is it possible to restart the Datanode role on nodes individually?

You can do the restarts one by one from the HDFS -> Instances page or the API, but you'll need to manually ensure that a DN has come back up in a functional, connected state before moving onto another (by checking the DN's metrics or its logs). The enterprise rolling restart does that check automatically as it progresses.

CM API is documented at http://cloudera.github.io/cm_api/

View solution in original post

2 REPLIES 2

avatar
Mentor
> 1) We are also running HBase in this cluster, with RegionServers colocated with the Datanodes. Are there any gotchas that should be considered given this?

The DN writes the blocks in typically round-robin'd manner across the disk list, but in your configuration if a disk is found to match the threshold it will select that disk over and over until the threshold gets reached.

If the threshold is very large (many thousand full blocks required to reach the divide) then the RS performance can likely suffer a bit when its trying to flush, replay or compact in parallel. This would go away when the disk rotation falls back to round robin due to no volume being in violation of the space threshold.

If you're OK in observing some small slowness (assuming a small difference threshold, and disks not being cleaned and reinserted too often) then you should be good. If however your HBase usage is very latency bound, then consider using a smaller preference fraction so it does not focus on pumping all work onto a single or specific set of disks when the threshold is found to be crossed.

> 2) We don't have CM Enterprise, so the official rolling restart is not available to us. Is it possible to restart the Datanode role on nodes individually?

You can do the restarts one by one from the HDFS -> Instances page or the API, but you'll need to manually ensure that a DN has come back up in a functional, connected state before moving onto another (by checking the DN's metrics or its logs). The enterprise rolling restart does that check automatically as it progresses.

CM API is documented at http://cloudera.github.io/cm_api/

avatar
Explorer

Thank you!