Created 02-14-2016 12:32 PM
Hi,
I noticed that after performing a rolling restart the data locality for the entire cluster goes down to 20% which is bad and for realtime applications this can be a nightmare.
I've read here that we should switch off the balancer before perform a manual rolling restart on HBase. However, I used the Ambari rolling restart and I didn't see any reference to the balancer in the documentation. Maybe the balancer is not the issue, what is the safest way to perform a rolling restart on all region servers but keeping the data locality at least above 75%. Is there any option on Ambari to take care of that before a RS Rolling Restart.
Another issue that I noticed is that some regions have split during the Rolling Restart but they are bit far for being full.
Any insights?
Thank you,
Cheers
Pedro
Created 02-14-2016 01:16 PM
@Pedro Gandola splitting occurs when your regions grow to the max size (hbase.hregion.max.filesize) as defined in your hbase-site.xml
http://hbase.apache.org/book.html#disable.splitting
when you run major compaction, the data locality is restored. Run major compactions on a busy system in off-peak hours.
balancer distributes regions across the cluster, runs every 5 minutes by default, do not turn it off. You can implement your own balancer and replace the default StochasticLoadBalancer class, not recommended unless you know what you're doing.
Another option is to enable read replicas, so essentially you're duplicating data in a different region server. The secondary replicas are read-only and maximize your data availablity.
All in all, it's more art than science and you need to experiment with many hbase properties to get an ultimate result.
Created 06-27-2016 12:03 PM
@Pedro Gandola Hi, Did you solve this issue using ConstantSizeRegionSplitPolicy?
Created 06-27-2016 12:17 PM
Hi @Minwoo Kang, Yes, that solved the problem.