Created 07-22-2023 05:59 AM
hello.
I am trying to CopyTable 5 minutes or 1 hour of data from a 15TB HBase Table to another cluster.
But then I get hbase.ipc.CallTimeoutException and the transfer keeps failing.
#1688313600000 ('2023-07-03 01:00:00') - 1688313900000 ('2023-07-03 01:05:00') : 5 minutes
hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.zookeeper.quorum=A01,B01,C02 --peer.adr=dev01dev02,dev03:2181:/hbase-unsecure --starttime=1688313600000 --endtime=1688313900000 'BIG_EMP'
#1688313600000 ('2023-07-03 01:00:00') - 1688317200000 ('2023-07-03 02:00:00')
hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.zookeeper.quorum=A01,B01,C02 --peer.adr=dev01dev02,dev03:2181:/hbase-unsecure --starttime=1688313600000 --endtime=1688317200000 'BIG_EMP'
2023-07-20 15:56:05,731 INFO [main] mapreduce.Job: map 1% reduce 0%
2023-07-20 15:56:08,787 INFO [main] mapreduce.Job: map 2% reduce 0%
2023-07-20 15:56:12,843 INFO [main] mapreduce.Job: map 3% reduce 0%
2023-07-20 15:58:05,093 INFO [main] mapreduce.Job: Task Id : attempt_1670552835275_0148_m_000003_0, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=16, exceptions:
2023-07-20T06:58:04.295Z, java.net.SocketTimeoutException: callTimeout=60000, callDuration=60105: Call to A02/17.91.172.237:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=8,methodName=Scan], waitTime=60004, rpcTimeout=60000 row 'U022|11|1|230519-STL-131Ah-B-spl-30oC-WLTP-8th-cycle-B22-11-12-15-re' on table 'LDSM_CDGS_DATA' at region=xjfdjsfdjksdfkjfsd,1686849006313.1e8adfc0af2cf791361126e24822e63c., hostname=A02,16020,1689816082571, seqNum=860075
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:299)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:251)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267)
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435)
at org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595)
Settings related to CallTimeout are as follows.
hbase.rpc.timeout=120000
hbase.client.scanner.timeout.period=120000
In this case, should I increase the timeout and test it?
Or is there another way to act?
Created 07-25-2023 01:46 AM
Hi @lukepoo . Yes, you can try to raise the timeouts as well do check if the regions of this table have a good locality. For the current run, it seems the timeout happened on the region '1e8adfc0af2cf791361126e24822e63c'. So you can check the region locality and try to compact them if its less than 1.
Created 07-25-2023 01:46 AM
Hi @lukepoo . Yes, you can try to raise the timeouts as well do check if the regions of this table have a good locality. For the current run, it seems the timeout happened on the region '1e8adfc0af2cf791361126e24822e63c'. So you can check the region locality and try to compact them if its less than 1.
Created 07-31-2023 08:45 AM
@lukepoo, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,