Posts: 6
Registered: ‎01-23-2018

Ensure HBase replication is up-to-date at specific time

I would like to make sure that all data in my main table has been replicated to my secundary cluster at a specific time every day.


At the moment my strategy is to use "status 'replication'" from the hbase shell to ensure among others that "SizeOfLogQueue" = 0. But in case it is not, I would like to have an option to prioritize replication catch-up over other activities.


The ReplicationSyncUp tool seems like a viable option for this.

My question is: can the ReplicationSyncUp tool be used safely while HBase is running? and also, are there any reasons, such as excessive strain on the cluster, that I should not use the tool for this purpose?


In the Jira issue for the tool it mentions that it could be used while HBase is up, but does not give any conclusion as to whether it ended up being implemented in such a way.


Answers as well as any speculations would appreciate.

Posts: 1,697
Kudos: 341
Solutions: 264
Registered: ‎07-31-2013

Re: Ensure HBase replication is up-to-date at specific time

Replication already has a dedicated and independent handler queue/processing threads, and runs optimistically at all times. Did you face a situation where replication was preempted for other activity after an unstable period?

As to ReplicationSyncUp, its function is to read the HBase WAL logs on HDFS after figuring out the pending states from ZooKeeper, and to send out the data onto the destination peer.

Its not a good idea to try it on a live master cluster because the ZooKeeper states and the WAL log contents are dynamically changing, and there will end up being two replication efforts ongoing.