Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hbase replication retry

hbase replication retry

Contributor

I have 2 clusters ( 1 master and 1 slave) on CDH 5.4 hbase 1.0

replication is working 95% of the time 

but I do get the following WARN which I consider an error

 

 

Can't replicate because of an error on the remote cluster: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException): org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 11 actions: NotServingRegionException: 11 times, 
	at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:227)
	at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:207)
	at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1563)
	at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1003)
	at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:1017)
	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:236)
	at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:160)
	at org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1584)
	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20880)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
	at java.lang.Thread.run(Thread.java:745)

I consider this an error because my slave is missing data that I have in the master.   Is there a settting to keep trying to send ?

 

4 REPLIES 4

Re: hbase replication retry

Master Guru
Replication will try its best to reach the slave cluster. If it is unable to immediately replicate a recent entry, it will try to replay the entry from the kept-aside WAL later.

The reason the write may have failed from the master depends on how busy your Slave is or if it has been configured properly in terms of handlers (to accept the incoming load from Master), barring any network-level issues.

Re: hbase replication retry

Hi, 

 

I have 2 clusters running in master <-> master on CDH 5.8.3 version. One of the region server logs I'm seeing the oldWALs are increasing and getting the follwing message in region server log.

 

WARN org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint: Can't replicate because of an error on the remote cluster:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException): org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: t2: 1 time,
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:247)
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:227)
        at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.getErrors(AsyncProcess.java:1663)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:982)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:996)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.batch(ReplicationSink.java:256)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSink.replicateEntries(ReplicationSink.java:163)
        at org.apache.hadoop.hbase.replication.regionserver.Replication.replicateLogEntries(Replication.java:198)
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1814)
        at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22253)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
        at java.lang.Thread.run(Thread.java:745)

        at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1268)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
        at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
        at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.replicateWALEntry(AdminProtos.java:23209)
        at org.apache.hadoop.hbase.protobuf.ReplicationProtbufUtil.replicateWALEntry(ReplicationProtbufUtil.java:65)
        at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:337)
        at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint$Replicator.call(HBaseInterClusterReplicationEndpoint.java:322)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

 

 

What could be the reason of this and how to fix this. Please advise.

Re: hbase replication retry

New Contributor

Thanks Harsh.  Are you saying the master will continue to attempt replicaion of the failed rows even after the warning is logged.

Highlighted

Re: hbase replication retry

Master Guru
Yes.

The HBase replication state is always persisted, so it maintains an offset index of the last successfully replicated point of data for each target.

When your replication breaks down, for various reasons, you can fix those reasons and observe that replication will continue from where it left off.

One consequence of this behaviour is that the storage keeps growing while the replication is stalled (as observed by the earlier poster above).

In this thread's context (seems like an older thread that's been bumped), the NotServingRegionException may be due to state issues in the cluster (regions remaining unassigned or in FAILED_(OPEN|CLOSE) states, so an investigation via HBCK and RegionServer logs would be required to resolve it before replication can continue performing its writes.