Created 06-02-2016 09:25 AM
When I try to import data using the command:
hbase -Dhbase.import.version=0.94 'tablename' /hdfs/exported-data
After a while I get Region too busy exception. This is the mapreduce Job Explanation:
Total maps: 14711 Complete: 526 Failed: 130 Kiled: 380 Successful: 526
What I see in the console is
2016-06-02 11:05:19,632 INFO [main] mapreduce.Job: Task Id : attempt_1464792187762_0003_m_000347_0, Status : FAILED Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: RegionTooBusyException: 1 time, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:228) ...
What is causing this? I suspect that the compactions of hbase might be making region servers unresponsive. How do I solve it?
Created 06-09-2016 12:47 PM
It wasn't the problem with compactions but the number of map jobs.
The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.
It's also possible to find that setting in Hadoop's capacity-scheduler.xml
<property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
What really happened:
The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.
When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).
Created 06-02-2016 09:41 AM
Can you share the region server logs to check the reason why RegionTooBusyException was coming. In case if you feel major compaction is the reason then you can disable automatic major compactions by configuring below property.
<property> <name>hbase.hregion.majorcompaction</name> <value>0</value> <description>The time (in miliseconds) between 'major' compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to disable automated major compactions. </description> </property>
Created 06-02-2016 03:52 PM
Which version of HDP are you using ?
I am currently porting over this JIRA which would show us more information:
HBASE-15931 Add log for long-running tasks in AsyncProcess
How large is your region size ?
Did you monitor your region servers to see which ones were the hot spot during the import ?
Please pastebin more of the error / stack trace.
Thanks
Created 06-02-2016 03:53 PM
Please verify that regions of your table are evenly distributed across the servers.
Created 06-02-2016 07:54 PM
Two major reasons for RegionTooBusyException
To mitigate 1. you can increase maximum busy wait timeout hbase.ipc.client.call.purge.timeout in ms (default is 120000) directly but do not forget to increase hbase.rpc.timeout accordingly (set it to the same value)
To mitigate 2. you can increase hbase.hregion.memstore.block.multiplier from default(4) to some higher value.
But the best option for you use bulk import option:
-Dimport.bulk.output=/path/for/output
followed by completebulkload tool
See: https://hbase.apache.org/book.html#arch.bulk.load.complete
Created 06-03-2016 12:43 AM
Totally agree re bulk import. One additional point. You need to ensure the hbase user has access to read/write the files created by the -Dimport.bulk.output step. If it doesn't, the completebulkload step will appear to hang.
The simplest way to achieve this is to do:
hdfs dfs -chmod -R 777 <dir containing export files>as the owner of those files. completebulkload, running as hbase, simply moves these to the relevant HBase directories. With the permnissions correctly set, this takes fractions of a second.
Created 06-09-2016 01:36 PM
Is it possible to do the bulk import if the versions of hbase differ? The old cluster has hbase 0.94 while the new one has 1.1.2
Created 06-09-2016 10:03 PM
I believe so, yes. The -Dimport.bulk.output can be performed on the target. This will prep the HBase files according to the target version/number of region servers/etc.
Created 06-09-2016 12:47 PM
It wasn't the problem with compactions but the number of map jobs.
The solution was to change the YARN scheduler from Memory (default) to CPU: inside Ambari interface (I'm using Apache Ambari 2.2.2.0): YARN -> Configs -> Enable CPU Node Scheduling.
It's also possible to find that setting in Hadoop's capacity-scheduler.xml
<property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property>
What really happened:
The cluster consists of 20 nodes that have more than 1TB of RAM. YARN has 800GB of RAM available for the jobs. Because YARN uses memory for calculation of number of containers it assigned about 320 containers for map jobs (800GB / 2,5GB(per MapReduce2 job) = 320 jobs!!!). This was like flooding our own servers with processes and requests.
When changing that to CPU YARN capacity scheduling it changed it's formula for number of containers to 20nodes * 6Virtual_Cores = 120 processes which is much more manageable (and works fine for now).
Created 03-22-2017 04:26 PM
I have hit the exact same problem before and it took me a long time to solve it.
Basically this error means the Hbase region server is overloaded due to too many parallel writing threads.
Also bulk load can cause memstore to saturate. Since hbase does not have a good back pressure, applications that writes into hbase need to control QPS.
In my scenario, I was using Spark bulk load to write into hbase, and caused hbase region server loaded.
There are a few ways that can potentially solve this problem:
1. Pre-split the hbase table so multiple region servers can handle writes
2. Tune down the RDD partitions in spark right before calling bulk load. This can reduce the parallel writer threads from spark executors