Created 06-02-2016 09:25 AM
When I try to import data using the command:
hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data
After a while I get Region too busy exception. This is the mapreduce Job Explanation:
Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526
What I see in the console is
2016-06-02 11:05:19,632 INFO [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) ...
What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?
Created 06-09-2016 12:47 PM
It wasn't the problem with compactions but the number of map jobs.
The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.
It's also possible to find that setting in Hadoop's capacity-scheduler.xml
<property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
What really happened:
The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.
When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).
Created 05-29-2019 01:25 PM
I have faced exact issue, when trying to import around 2 TB data into HBase.
There are following ways which can solve the issue.
1. Increase hbase.hregion.memstore.block.multiplier = 8
2. increase % of RegionServer Allocated to Write Buffers from 40% to 60%.
3. Pre Split Hbase Table using start key of same Table that's might be exist on another cluster using below command.
create '<HbaseTableName>',{ NAME => '<ColumnFamily>', COMPRESSION => '<Compression>'}, SPLITS=> ['<startkey1>','<startkey2>','<startkey3>','<startkey4>','<startkey5>']
Hbase Presplit enable multiple region servers can handle writes concurrently.
Note: Basically this issue is appearing due to bulk write to Hbase.
Created 05-31-2019 08:27 AM
Really helpful. Worked for my production system.