Support Questions
Find answers, ask questions, and share your expertise

I'm getting RegionTooBusyException when trying to import data into hbase

New Contributor

When I try to import data using the command:

 hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data

After a while I get Region too busy exception. This is the mapreduce Job Explanation:

Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526

What I see in the console is

2016-06-02 11:05:19,632 INFO  [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)

...

What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?

1 ACCEPTED SOLUTION

New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

View solution in original post

11 REPLIES 11

Can you share the region server logs to check the reason why RegionTooBusyException was coming. In case if you feel major compaction is the reason then you can disable automatic major compactions by configuring below property.

 <property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
<description>The time (in miliseconds) between 'major' compactions of all
HStoreFiles in a region.  Default: 1 day.
Set to 0 to disable automated major compactions.
</description>
</property>

Super Collaborator

Which version of HDP are you using ?

I am currently porting over this JIRA which would show us more information:

HBASE-15931 Add log for long-running tasks in AsyncProcess

How large is your region size ?

Did you monitor your region servers to see which ones were the hot spot during the import ?

Please pastebin more of the error / stack trace.

Thanks

Super Collaborator

Please verify that regions of your table are evenly distributed across the servers.

Contributor

Two major reasons for RegionTooBusyException

  1. Failure to acquire region lock (look for " failed to get a lock in " in map task log)
  2. Region memstore is above limit and flushes can not keep up with load (look for "Above memstore limit")

To mitigate 1. you can increase maximum busy wait timeout hbase.ipc.client.call.purge.timeout in ms (default is 120000) directly but do not forget to increase hbase.rpc.timeout accordingly (set it to the same value)

To mitigate 2. you can increase hbase.hregion.memstore.block.multiplier from default(4) to some higher value.

But the best option for you use bulk import option:

-Dimport.bulk.output=/path/for/output

followed by completebulkload tool

See: https://hbase.apache.org/book.html#arch.bulk.load.complete

Totally agree re bulk import. One additional point. You need to ensure the hbase user has access to read/write the files created by the -Dimport.bulk.output step. If it doesn't, the completebulkload step will appear to hang.

The simplest way to achieve this is to do:

hdfs dfs -chmod -R 777 <dir containing export files>
as the owner of those files. completebulkload, running as hbase, simply moves these to the relevant HBase directories. With the permnissions correctly set, this takes fractions of a second.

New Contributor

Is it possible to do the bulk import if the versions of hbase differ? The old cluster has hbase 0.94 while the new one has 1.1.2

I believe so, yes. The -Dimport.bulk.output can be performed on the target. This will prep the HBase files according to the target version/number of region servers/etc.

New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

Explorer

I have hit the exact same problem before and it took me a long time to solve it.

Basically this error means the Hbase region server is overloaded due to too many parallel writing threads.

Also bulk load can cause memstore to saturate. Since hbase does not have a good back pressure, applications that writes into hbase need to control QPS.

In my scenario, I was using Spark bulk load to write into hbase, and caused hbase region server loaded.

There are a few ways that can potentially solve this problem:

1. Pre-split the hbase table so multiple region servers can handle writes

2. Tune down the RDD partitions in spark right before calling bulk load. This can reduce the parallel writer threads from spark executors

New Contributor

I have faced exact issue, when trying to import around 2 TB data into HBase.

There are following ways which can solve the issue.

1. Increase hbase.hregion.memstore.block.multiplier = 8

2. increase % of RegionServer Allocated to Write Buffers from 40% to 60%.

3. Pre Split Hbase Table using start key of same Table that's might be exist on another cluster using below command.

create '<HbaseTableName>',{ NAME => '<ColumnFamily>', COMPRESSION => '<Compression>'}, SPLITS=> ['<startkey1>','<startkey2>','<startkey3>','<startkey4>','<startkey5>']

Hbase Presplit enable multiple region servers can handle writes concurrently.

Note: Basically this issue is appearing due to bulk write to Hbase.

New Contributor

Really helpful. Worked for my production system.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.