Support Questions

Find answers, ask questions, and share your expertise

hbase balancer return 1 and not do balancer

New Contributor

hi  all, 

hbase version: cdh 6.2.0

 

balancer config

 

<property>
<name>hbase.master.loadbalancer.class</name>
<value>org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer</value>
<final>false</final>
<source>hbase-default.xml</source>
</property>
 
<property>
<name>hbase.balancer.period</name>
<value>300000</value>
<final>false</final>
<source>hbase-default.xml</source>
</property>
 
but the balancer not scheduler  interval  5min and no error logs  and use  balancer command in shell  will return  1 
 
hbase(main):007:0> balancer
true
Took 0.0746 seconds
=> 1
 
and there any help to fix it ?
5 REPLIES 5

Super Collaborator

Hello @mingtian 

 

Thanks for using Cloudera Community. Based on your Post, We would suggest enabling DEBUG Logging for HMaster (Via HMaster UI To Avoid Any Restart) & trigger the Balancer. Generally, Balancer Algorithm may be deciding against running any Region-Alignment owing to Cost Factor [1]. The HMaster Debug Log would print such Balancer information for your review, upon which the Params discussed in [1] can be tuned to force Balancer, yet the Default Params are generally persisted for most Use-Cases. 

 

Note that Balancer Job isn't to merely fit Equal Regions per RegionServer. Balancer consider various Cost as defined by [1] to proceed with Region-Alignment. 

 

Regards, Smarak

 

[1] StochasticLoadBalancer (Apache HBase 3.0.0-alpha-4-SNAPSHOT API)

New Contributor

Thanks for the reply, I'll try it

Super Collaborator

Hello @mingtian 

 

Hope you are doing well. We wish to follow-up with you & check if the DEBUG Logging assisted in confirming the reasoning for Balancer Algorithm deciding against Region-Movement. If Yes, Kindly let us know if your Q in the Post has been answered or any further Q remains. 

 

Regards, Smarak

New Contributor

hello , 

set the log level to debug  and  run the command `balancer` on the shell 

 

and the debug log  was  only :

 

2023-01-18 06:15:18,142 DEBUG org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: RegionReplicaHostCostFunction not needed
2023-01-18 06:15:18,142 DEBUG org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: RegionReplicaRackCostFunction not needed

 

and still not run the balancer 

 

 

 

 

Super Collaborator

Hello @mingtian 

 

Note that Debug Logging won't ensure the Balancer would perform Region Movement, rather the same would confirm if Balancer is running yet the same isn't moving any Region owing to CostFactor. Example: I ensured 1 RegionServer didn't had any Region by RegionMovement & triggered a Balancer, which showed [1] & trigger a Region Movement (Note "Found A Solution That Moves 1 Region"). After the 1st Balancer is Completed, I triggered a 2nd Balancer, which printed [2], wherein the DEBUG report "Skipping Load Balancing". 

 

I believe your Team would see [2] i.e. Balancer is Skipping any Load Balancing owing to Cost Factor. As such, Your Team can consider the fact that HBase is rejecting Region Movement owing to the fact that any new Region-Movement is "Costlier" than Current Region Placement. Tweaking [3] Cost Parameters including setting "hbase.master.loadbalance.bytable" to "true" should help trigger a Balancer for your Team.

 

Regards, Smarak

 

[1] 

2023-01-18 06:38:33,290 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 95 ms to try 7200 different iterations.  Found a solution that moves 1 regions; Going from a computed imbalance of 0.4961380973335763 to a new imbalance of 0.020487264673311183. funtionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.0); PrimaryRegionCountSkewCostFunction : (not needed); MoveCostFunction : (multiplier=7.0, imbalance=0.3333333333333333, need balance); ServerLocalityCostFunction : (multiplier=25.0, imbalance=0.0); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.0); TableSkewCostFunction : (multiplier=35.0, imbalance=0.0); RegionReplicaHostCostFunction : (not needed); RegionReplicaRackCostFunction : (not needed); ReadRequestCostFunction : (multiplier=5.0, imbalance=1.0, need balance); WriteRequestCostFunction : (multiplier=5.0, imbalance=1.0, need balance); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.0); StoreFileCostFunction : (multiplier=5.0, imbalance=0.0); 

 [2] 

2023-01-18 06:39:05,365 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Cluster wide - skipping load balancing because weighted average imbalance=0.013858568086431631 <= threshold(0.025). If you want more aggressive balancing, either lower hbase.master.balancer.stochastic.minCostNeedBalance from 0.025 or increase the relative multiplier(s) of the specific cost function(s). functionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.0); PrimaryRegionCountSkewCostFunction : (not needed); MoveCostFunction : (multiplier=7.0, imbalance=0.0); ServerLocalityCostFunction : (multiplier=25.0, imbalance=0.0); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.0); TableSkewCostFunction : (multiplier=35.0, imbalance=0.0); RegionReplicaHostCostFunction : (not needed); RegionReplicaRackCostFunction : (not needed); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.6685715976063684, need balance); WriteRequestCostFunction : (multiplier=5.0, imbalance=1.0, need balance); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.0); StoreFileCostFunction : (multiplier=5.0, imbalance=0.0); 

 

[3] StochasticLoadBalancer (Apache HBase 3.0.0-alpha-4-SNAPSHOT API)