Created 12-15-2016 04:30 PM
I'm having an intermittent timeout error when running some example code against HBase. The basic Java application creates a Scanner and queries a particular HBase table. For small queries, my code works fine. But, when I increase the TimeRange of my query, I get intermittent timeout errors, as seen below. Googling and searching the forum has not yielded any plausible solutions. Does anyone have any idea what the source of this error might be, and how to mitigate it?
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: [java] Thu Dec 15 10:33:43 EST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987 [java] [java] at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) [java] at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) [java] at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) [java] at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) [java] at test.MyTestClass.main(MyTestClass.java:70) [java] Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987 [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) [java] at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64) [java] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [java] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [java] at java.lang.Thread.run(Thread.java:745) [java] Caused by: java.io.IOException: Call to hostname-007.localdomain.local/10.0.0.106:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262) [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230) [java] at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) [java] at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) [java] at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213) [java] at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346) [java] at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320) [java] at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) [java] ... 4 more [java] Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired. [java] at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) [java] at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204) [java] ... 13 more [java] Java Result: 1
Created 12-15-2016 04:33 PM
When you increase your time range, you have to read more data. HBase defines the maximum length of any RPC by the hbase.rpc.timeout property in hbase-site.xml. This defaults to 60s, and this limit is what you're hitting.
If you want to run a query that will scan over more data or generally take a long time (such as server-side filtering), you will have to increase hbase.rpc.timeout commensurately.
Created 12-15-2016 04:38 PM
I've checked this, but I've already got these timeout values set to 18000 (3 mins), so, I don't see why I'm getting a 60s timeout
cat /etc/hbase/conf/hbase-site.xml | grep -2 rpc.timeout <property> <name>hbase.rpc.timeout</name> <value>180000</value> </property>
Created 12-15-2016 04:40 PM
Make sure that /etc/hbase/conf is included on your client's classpath.
Created 12-15-2016 04:52 PM
I've confirmed that /etc/hbase/conf is on the classpath, and I've added the following code to my test script:
Configuration conf = HBaseConfiguration.create(); System.out.println("Timeout: " + conf.get("hbase.rpc.timeout"));
The above outputs 18000 as expected.
Created 12-15-2016 05:22 PM
Ok, last check would be to verify that all of your RegionServers also have that configuration value. The easiest way is to venture to the HBase UI for each RegionServer (via the Master UI is the easiest) and verify that the value is set after clicking on "HBase Configuration" at the top of the page.
Created 12-15-2016 05:53 PM
I've confirmed that this setting is the same across all machines in the cluster using the same command as above.
Created 12-15-2016 05:20 PM
is the port 16020 open on the nodes. especially hostname-007.localdomain.local
Created 12-15-2016 05:23 PM
He would not be seeing a SocketTimeoutException if the socket was unable to make a connection on that host+port. The SocketTimeoutException implies that the socket is connected.
Created 12-15-2016 05:53 PM
Yes, I've confirmed the port is open