Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I'm getting RegionTooBusyException when trying to import data into hbase

Solved Go to solution

I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

When I try to import data using the command:

 hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data

After a while I get Region too busy exception. This is the mapreduce Job Explanation:

Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526

What I see in the console is

2016-06-02 11:05:19,632 INFO  [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)

...

What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

11 REPLIES 11

Re: I'm getting RegionTooBusyException when trying to import data into hbase

Can you share the region server logs to check the reason why RegionTooBusyException was coming. In case if you feel major compaction is the reason then you can disable automatic major compactions by configuring below property.

 <property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
<description>The time (in miliseconds) between 'major' compactions of all
HStoreFiles in a region.  Default: 1 day.
Set to 0 to disable automated major compactions.
</description>
</property>

Re: I'm getting RegionTooBusyException when trying to import data into hbase

Super Collaborator

Which version of HDP are you using ?

I am currently porting over this JIRA which would show us more information:

HBASE-15931 Add log for long-running tasks in AsyncProcess

How large is your region size ?

Did you monitor your region servers to see which ones were the hot spot during the import ?

Please pastebin more of the error / stack trace.

Thanks

Re: I'm getting RegionTooBusyException when trying to import data into hbase

Super Collaborator

Please verify that regions of your table are evenly distributed across the servers.

Re: I'm getting RegionTooBusyException when trying to import data into hbase

Contributor

Two major reasons for RegionTooBusyException

  1. Failure to acquire region lock (look for " failed to get a lock in " in map task log)
  2. Region memstore is above limit and flushes can not keep up with load (look for "Above memstore limit")

To mitigate 1. you can increase maximum busy wait timeout hbase.ipc.client.call.purge.timeout in ms (default is 120000) directly but do not forget to increase hbase.rpc.timeout accordingly (set it to the same value)

To mitigate 2. you can increase hbase.hregion.memstore.block.multiplier from default(4) to some higher value.

But the best option for you use bulk import option:

-Dimport.bulk.output=/path/for/output

followed by completebulkload tool

See: https://hbase.apache.org/book.html#arch.bulk.load.complete

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

Totally agree re bulk import. One additional point. You need to ensure the hbase user has access to read/write the files created by the -Dimport.bulk.output step. If it doesn't, the completebulkload step will appear to hang.

The simplest way to achieve this is to do:

hdfs dfs -chmod -R 777 <dir containing export files>
as the owner of those files. completebulkload, running as hbase, simply moves these to the relevant HBase directories. With the permnissions correctly set, this takes fractions of a second.

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

Is it possible to do the bulk import if the versions of hbase differ? The old cluster has hbase 0.94 while the new one has 1.1.2

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

I believe so, yes. The -Dimport.bulk.output can be performed on the target. This will prep the HBase files according to the target version/number of region servers/etc.

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

Re: I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

I have hit the exact same problem before and it took me a long time to solve it.

Basically this error means the Hbase region server is overloaded due to too many parallel writing threads.

Also bulk load can cause memstore to saturate. Since hbase does not have a good back pressure, applications that writes into hbase need to control QPS.

In my scenario, I was using Spark bulk load to write into hbase, and caused hbase region server loaded.

There are a few ways that can potentially solve this problem:

1. Pre-split the hbase table so multiple region servers can handle writes

2. Tune down the RDD partitions in spark right before calling bulk load. This can reduce the parallel writer threads from spark executors