I'm a cluster administrator where there are over 100 Region servers for HBase and 50 + tables that does Bulk loads , Batch puts from Spark Batch, Streaming , Map-reduce applications. Every time we do some maintenance on RS's , we do notice that most of the tables get a new split after starting RS and that adds to the overall regions count. From Hbase blogs , I see having over 200 regions / RS is not recommended. We run HDP 2.6.5 , so HBase 1.1.2 version. So , my question is
1. What can I as an admin do , to avoid these costly splits ?
2. Should I try addressing this at table property level such as splitpolicy, compressions [Note: Even with constant split policy , i do see regions split even before hitting max hfile limit]
3. Is this addressed in HBase 2.0 or later ? Our capacity planning to add/reduce RS depends on the number of regions in the cluster.