- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Intermittent Timeout Error When Querying HBase
- Labels:
-
Apache HBase
Created ‎12-15-2016 04:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having an intermittent timeout error when running some example code against HBase. The basic Java application creates a Scanner and queries a particular HBase table. For small queries, my code works fine. But, when I increase the TimeRange of my query, I get intermittent timeout errors, as seen below. Googling and searching the forum has not yielded any plausible solutions. Does anyone have any idea what the source of this error might be, and how to mitigate it?
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: [java] Thu Dec 15 10:33:43 EST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987 [java] [java] at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) [java] at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) [java] at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) [java] at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) [java] at test.MyTestClass.main(MyTestClass.java:70) [java] Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987 [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) [java] at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64) [java] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [java] at java.lang.Thread.run(Thread.java:745) [java] Caused by: java.io.IOException: Call to hostname-007.localdomain.local/10.0.0.106:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262) [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230) [java] at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) [java] at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) [java] at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) [java] ... 4 more [java] Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. [java] at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204) [java] ... 13 more [java] Java Result: 1
Created ‎12-15-2016 04:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you increase your time range, you have to read more data. HBase defines the maximum length of any RPC by the hbase.rpc.timeout property in hbase-site.xml. This defaults to 60s, and this limit is what you're hitting.
If you want to run a query that will scan over more data or generally take a long time (such as server-side filtering), you will have to increase hbase.rpc.timeout commensurately.
Created ‎12-15-2016 04:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've checked this, but I've already got these timeout values set to 18000 (3 mins), so, I don't see why I'm getting a 60s timeout
cat /etc/hbase/conf/hbase-site.xml | grep -2 rpc.timeout <property> <name>hbase.rpc.timeout</name> <value>180000</value> </property>
Created ‎12-15-2016 04:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Make sure that /etc/hbase/conf is included on your client's classpath.
Created ‎12-15-2016 04:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've confirmed that /etc/hbase/conf is on the classpath, and I've added the following code to my test script:
Configuration conf = HBaseConfiguration.create(); System.out.println("Timeout: " + conf.get("hbase.rpc.timeout"));
The above outputs 18000 as expected.
Created ‎12-15-2016 05:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, last check would be to verify that all of your RegionServers also have that configuration value. The easiest way is to venture to the HBase UI for each RegionServer (via the Master UI is the easiest) and verify that the value is set after clicking on "HBase Configuration" at the top of the page.
Created ‎12-15-2016 05:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've confirmed that this setting is the same across all machines in the cluster using the same command as above.
Created ‎12-15-2016 05:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is the port 16020 open on the nodes. especially hostname-007.localdomain.local
Created ‎12-15-2016 05:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
He would not be seeing a SocketTimeoutException if the socket was unable to make a connection on that host+port. The SocketTimeoutException implies that the socket is connected.
Created ‎12-15-2016 05:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I've confirmed the port is open
