- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HBase region servers giving org.apache.hadoop.hbase.RegionTooBusyException when inserting data
- Labels:
-
Apache HBase
Created on ‎09-05-2017 09:12 AM - edited ‎09-16-2022 05:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a daily data loading process where we insert data in to HBase. Recently it started giving me errors saying 'org.apache.hadoop.hbase.RegionTooBusyException' but the job is progressing and I can see data in the table. That error makes daily loading very slow.
2017-09-01 07:38:10,285 INFO [hconnection-0x338b180b-shared--pool1-t72] org.apache.hadoop.hbase.client.AsyncProcess: #2, table=tweetTable-2017-08, attempt=10/35 failed=1455ops, last exception: org.apache.hadoop.hbase.RegionTooBusyException: org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, regionName=tweetTable-2017-08,,1504119689670.257e69e222e3577c8b96ec34572f4aa8., server=moe-cn05,60020,1504207500995, memstoreSize=2222782950, blockingMemStoreSize=2147483648
at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:3657)
at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2867)
at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2818)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:751)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:713)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2142)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
on moe-cn05,60020,1504207500995, tracking started null, retrying after=10054ms, replay=1455ops
Any help is really appreciate.
Thanks,
Chathuri
Created ‎09-06-2017 02:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Usually, when the MemStore for a region nears its configured limit (such as 256 MB), it triggers a HDFS flush. Flushing ~256 MB should be quick enough that the MemStore can be trimmed down again. However, in your case the Flush is likely blocked (waiting in a queue, or waiting on HDFS I/O) or is taking very long.
Some ideas:
Look in your RegionServer logs (moe-cn05 for example) for "[Ff]lush" related messages around the time of the issue (2017-09-01 ~0700 hours). If you are observing small data size flushes completing in long times, the issue may be on the HDFS I/O (Investigate NN response times, DN connectivity, Network and Disk I/O).
If you are seeing flushes occur in regular time, then it may be the flush request queue (CM has an alert for this). You can see the metrics of this RS to find out how many flush requests were waiting in the queue at that point. Increasing the total number of parallel flusher work threads can help drain the request queue faster.
If you're observing no flushes complete, it could be a bug or a hang due to some custom logic (if you use coprocessors). Use a jstack output (or visit /stacks on the RS Web UI) to analyze where the flusher threads are hung or if they are waiting to lock some resource thats hung in another thread.
Created ‎09-06-2017 08:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harsh,
Thank you so much for suggestions. I looked at region server logs around that time but I could not find any errors. But there were some errors around 07:37.
2017-09-01 07:37:19,347 WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region tweetTable-2017-08,,1504119689670.257e69e222e3577c8b96ec34572f4aa8. has too many store files; delaying flush up to 90000ms
2017-09-01 07:38:49,358 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Waited 90011ms on a compaction to clean up 'too many store files'; waited long enough... proceeding with flush of tweetTable-2017-08,,1504119689670.257e69e222e3577c8b96ec34572f4aa8.
2017-09-01 07:38:49,358 INFO org.apache.hadoop.hbase.regionserver.HRegion: Flushing 1/1 column families, memstore=2.07 GB
2017-09-01 07:39:25,296 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=55001, memsize=2.1 G, hasBloomFilter=true, into tmp file hdfs://nameservice1/hbase/data/default/tweetTable-2017-08/257e69e222e3577c8b96ec34572f4aa8/.tmp/39d43e930f454644a34f2899ba7ec49e
2017-09-01 07:39:25,320 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://nameservice1/hbase/data/default/tweetTable-2017-08/257e69e222e3577c8b96ec34572f4aa8/d/39d43e930f454644a34f2899ba7ec49e, entries=7607196, sequenceid=55001, filesize=133.8 M
2017-09-01 07:39:25,322 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~2.07 GB/2222782950, currentsize=0 B/0 for region tweetTable-2017-08,,1504119689670.257e69e222e3577c8b96ec34572f4aa8. in 35964ms, sequenceid=55001, compaction requested=true
But it seems region server was able to flush the memstore. I checked the namenode logs for the same period. But I could not find any warnings or errors.
Created ‎09-14-2017 10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did some configuration changes and it fixed the issue.
Thanks,
Chathuri
Created ‎09-14-2017 10:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In spirit of https://xkcd.com/979/ 🙂
Created ‎09-14-2017 10:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I followed this artical (http://gbif.blogspot.com/2012/07/optimizing-writes-in-hbase.html) and change below parameters.
- Maximum Number of HStoreFiles Compaction : 20
- HStore Blocking Store Files : 200
- HBase Memstore Block Multiplier : 4
- HBase Memstore Flush Size : 256
Hope this helps.
Thanks,
Chathuri
