Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HBase compaction while adding region servers

avatar
Expert Contributor

Hi,

 

What is the procedure to follow when addin additional region server to an existing cluster? We use CDH 5.14

1 ACCEPTED SOLUTION

avatar
Mentor
In that case the balancer can be run later, but the compaction may still
help with keeping the HBase application request latency low, if that is an
immediate concern in the cluster.

View solution in original post

3 REPLIES 3

avatar
Mentor
Adding the Region Servers to the cluster itself, is fairly simple: Just use the Add Role Instances button in HBase -> Instances page, or the Add New Hosts button on Hosts page of Cloudera Manager. No configuration changes are required beyond this wizard process.

In most situations, adding a new Region Server host also means adding a new Data Node host. So before you run the newly adding Region Servers, ensure you've first run and completed a HDFS Balancer to populate the DataNodes themselves (upto an acceptable threshold of equality, say, ±10%).

When you start the new Region Servers, they begin empty and await region assignments from Master to send it regions. Existing regions in the cluster would need to get rebalanced across the newer cluster size for this to happen.

The HBase Master runs the region balancer regularly, but only if there are no regions stuck in assignment (no regions in transition). Ensure via HBCK or via the active HMaster Web UI that there are no regions in transition before you start the newly adding Region Servers.

Once you observe the HBase Regions Balancer kick in and complete (typical wait is ~5 minutes for it to begin, and you can follow along in the active HMaster Web UI), you should start seeing some regions being served by your new Region Servers and the average number of regions on the older Region Servers should slightly reduce.

At this point the new servers' regions will each have a poor data locality (a value under 90-95% can be considered poor, especially for low-latency application-used tables), so it is worth running the major_compact HBase shell command on at least the most important tables in your environment.

avatar
Expert Contributor

@Harsh J Thanks for the detailed answer. The cluster we have is having very less data and all the datanodes haven't utilized atleast 10% of storage yet but is expected to use atleast 60% in a week. Would running HDFS balancer and compaction make any difference in this case!

avatar
Mentor
In that case the balancer can be run later, but the compaction may still
help with keeping the HBase application request latency low, if that is an
immediate concern in the cluster.