Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What's the best to approach to bring HBase DR in sync with Master

Solved Go to solution
Highlighted

What's the best to approach to bring HBase DR in sync with Master

Super Guru

I have a question about HBase backup. Let's say if I am running HBase replication. Replication is master-slave. Slave is the DR site. Now imagine for some reason, the network failure occurs or let's say Slave cluster dies. Master is running just fine. Now we bring Slave up after 3 hours (or whatever number of hours). What's the best way in this case to make sure slave gets the data for those three hours?

I was thinking about copyTable using startTime and endTime but wanted to confirm if that's the right/best approach. Also, how much load will copy table create on master cluster? I read in documentation that I should be able to run copyTable from my target machine (DR in this case). Is my understanding correct and how does this alleviate any load? Is that because map reduce is running on remote cluster and reading data from a remote machine?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What's the best to approach to bring HBase DR in sync with Master

New Contributor

3 or whatever number of slave cluster downtime is not really an exceptional case for replication. According to the documents linked below, HBase and Zookeeper will collect a backlog of edits and once the slave cluster is up again, replicate the older edits the same way as newer ones. So in this normal case, best way is to let it do its job.

https://hbase.apache.org/0.94/replication.html#Normal_processing

https://hbase.apache.org/0.94/replication.html#Non-responding_slave_clusters

Abnormal cases can occur if table data gets corrupted and the replication breaks. Then you may have to copy. I don't know about the loads.

1 REPLY 1

Re: What's the best to approach to bring HBase DR in sync with Master

New Contributor

3 or whatever number of slave cluster downtime is not really an exceptional case for replication. According to the documents linked below, HBase and Zookeeper will collect a backlog of edits and once the slave cluster is up again, replicate the older edits the same way as newer ones. So in this normal case, best way is to let it do its job.

https://hbase.apache.org/0.94/replication.html#Normal_processing

https://hbase.apache.org/0.94/replication.html#Non-responding_slave_clusters

Abnormal cases can occur if table data gets corrupted and the replication breaks. Then you may have to copy. I don't know about the loads.

Don't have an account?
Coming from Hortonworks? Activate your account here