Support Questions

Find answers, ask questions, and share your expertise

I'm getting RegionTooBusyException when trying to import data into hbase

avatar
New Contributor

When I try to import data using the command:

 hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data

After a while I get Region too busy exception. This is the mapreduce Job Explanation:

Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526

What I see in the console is

2016-06-02 11:05:19,632 INFO  [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)

...

What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?

1 ACCEPTED SOLUTION

avatar
New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

View solution in original post

11 REPLIES 11

avatar

Can you share the region server logs to check the reason why RegionTooBusyException was coming. In case if you feel major compaction is the reason then you can disable automatic major compactions by configuring below property.

 <property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
<description>The time (in miliseconds) between 'major' compactions of all
HStoreFiles in a region.  Default: 1 day.
Set to 0 to disable automated major compactions.
</description>
</property>

avatar
Master Collaborator

Which version of HDP are you using ?

I am currently porting over this JIRA which would show us more information:

HBASE-15931 Add log for long-running tasks in AsyncProcess

How large is your region size ?

Did you monitor your region servers to see which ones were the hot spot during the import ?

Please pastebin more of the error / stack trace.

Thanks

avatar
Master Collaborator

Please verify that regions of your table are evenly distributed across the servers.

avatar
Rising Star

Two major reasons for RegionTooBusyException

  1. Failure to acquire region lock (look for " failed to get a lock in " in map task log)
  2. Region memstore is above limit and flushes can not keep up with load (look for "Above memstore limit")

To mitigate 1. you can increase maximum busy wait timeout hbase.ipc.client.call.purge.timeout in ms (default is 120000) directly but do not forget to increase hbase.rpc.timeout accordingly (set it to the same value)

To mitigate 2. you can increase hbase.hregion.memstore.block.multiplier from default(4) to some higher value.

But the best option for you use bulk import option:

-Dimport.bulk.output=/path/for/output

followed by completebulkload tool

See: https://hbase.apache.org/book.html#arch.bulk.load.complete

avatar
Contributor

Totally agree re bulk import. One additional point. You need to ensure the hbase user has access to read/write the files created by the -Dimport.bulk.output step. If it doesn't, the completebulkload step will appear to hang.

The simplest way to achieve this is to do:

hdfs dfs -chmod -R 777 <dir containing export files>
as the owner of those files. completebulkload, running as hbase, simply moves these to the relevant HBase directories. With the permnissions correctly set, this takes fractions of a second.

avatar
New Contributor

Is it possible to do the bulk import if the versions of hbase differ? The old cluster has hbase 0.94 while the new one has 1.1.2

avatar
Contributor

I believe so, yes. The -Dimport.bulk.output can be performed on the target. This will prep the HBase files according to the target version/number of region servers/etc.

avatar
New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

avatar
Contributor

I have hit the exact same problem before and it took me a long time to solve it.

Basically this error means the Hbase region server is overloaded due to too many parallel writing threads.

Also bulk load can cause memstore to saturate. Since hbase does not have a good back pressure, applications that writes into hbase need to control QPS.

In my scenario, I was using Spark bulk load to write into hbase, and caused hbase region server loaded.

There are a few ways that can potentially solve this problem:

1. Pre-split the hbase table so multiple region servers can handle writes

2. Tune down the RDD partitions in spark right before calling bulk load. This can reduce the parallel writer threads from spark executors