Support Questions

Find answers, ask questions, and share your expertise

Intermittent Timeout Error When Querying HBase

avatar
Explorer

I'm having an intermittent timeout error when running some example code against HBase. The basic Java application creates a Scanner and queries a particular HBase table. For small queries, my code works fine. But, when I increase the TimeRange of my query, I get intermittent timeout errors, as seen below. Googling and searching the forum has not yielded any plausible solutions. Does anyone have any idea what the source of this error might be, and how to mitigate it?


 Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
     [java] Thu Dec 15 10:33:43 EST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987
     [java]
     [java]     at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
     [java]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
     [java]     at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
     [java]     at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
     [java]     at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
     [java]     at test.MyTestClass.main(MyTestClass.java:70)
     [java] Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=60304: row '' on table 'my-table-name' at region=my-table-name,,1481665174391.d068f4be09585cf831dbcd3a04664caf., hostname=hostname-007.localdomain.local,16020,1481747196976, seqNum=34514987
     [java]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
     [java]     at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
     [java]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     [java]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     [java]     at java.lang.Thread.run(Thread.java:745)
     [java] Caused by: java.io.IOException: Call to hostname-007.localdomain.local/10.0.0.106:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired.
     [java]     at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262)
     [java]     at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230)
     [java]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
     [java]     at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
     [java]     at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
     [java]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
     [java]     at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
     [java]     at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
     [java]     ... 4 more
     [java] Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=60001, operationTimeout=60000 expired.
     [java]     at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
     [java]     at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204)
     [java]     ... 13 more
     [java] Java Result: 1
11 REPLIES 11

avatar
Super Guru

When you increase your time range, you have to read more data. HBase defines the maximum length of any RPC by the hbase.rpc.timeout property in hbase-site.xml. This defaults to 60s, and this limit is what you're hitting.

If you want to run a query that will scan over more data or generally take a long time (such as server-side filtering), you will have to increase hbase.rpc.timeout commensurately.

avatar
Explorer

I've checked this, but I've already got these timeout values set to 18000 (3 mins), so, I don't see why I'm getting a 60s timeout

 cat /etc/hbase/conf/hbase-site.xml | grep -2 rpc.timeout

    <property>
      <name>hbase.rpc.timeout</name>
      <value>180000</value>
    </property>

avatar
Super Guru

Make sure that /etc/hbase/conf is included on your client's classpath.

avatar
Explorer

I've confirmed that /etc/hbase/conf is on the classpath, and I've added the following code to my test script:

                Configuration conf = HBaseConfiguration.create();
                System.out.println("Timeout: " + conf.get("hbase.rpc.timeout"));

The above outputs 18000 as expected.

avatar
Super Guru

Ok, last check would be to verify that all of your RegionServers also have that configuration value. The easiest way is to venture to the HBase UI for each RegionServer (via the Master UI is the easiest) and verify that the value is set after clicking on "HBase Configuration" at the top of the page.

avatar
Explorer

I've confirmed that this setting is the same across all machines in the cluster using the same command as above.

avatar
Super Collaborator

is the port 16020 open on the nodes. especially hostname-007.localdomain.local

avatar
Super Guru

He would not be seeing a SocketTimeoutException if the socket was unable to make a connection on that host+port. The SocketTimeoutException implies that the socket is connected.

avatar
Explorer

Yes, I've confirmed the port is open