Support Questions

Find answers, ask questions, and share your expertise

I'm getting RegionTooBusyException when trying to import data into hbase

avatar
New Contributor

When I try to import data using the command:

 hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data

After a while I get Region too busy exception. This is the mapreduce Job Explanation:

Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526

What I see in the console is

2016-06-02 11:05:19,632 INFO  [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228)

...

What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?

1 ACCEPTED SOLUTION

avatar
New Contributor

It wasn't the problem with compactions but the number of map jobs.

The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.

It's also possible to find that setting in Hadoop's capacity-scheduler.xml

 <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

What really happened:

The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.

When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).

View solution in original post

11 REPLIES 11

avatar
New Contributor

I have faced exact issue, when trying to import around 2 TB data into HBase.

There are following ways which can solve the issue.

1. Increase hbase.hregion.memstore.block.multiplier = 8

2. increase % of RegionServer Allocated to Write Buffers from 40% to 60%.

3. Pre Split Hbase Table using start key of same Table that's might be exist on another cluster using below command.

create '<HbaseTableName>',{ NAME => '<ColumnFamily>', COMPRESSION => '<Compression>'}, SPLITS=> ['<startkey1>','<startkey2>','<startkey3>','<startkey4>','<startkey5>']

Hbase Presplit enable multiple region servers can handle writes concurrently.

Note: Basically this issue is appearing due to bulk write to Hbase.

avatar
New Contributor

Really helpful. Worked for my production system.