Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Full Disaster Recovery with Multiple On-Premise Data Centers

Solved Go to solution

Full Disaster Recovery with Multiple On-Premise Data Centers

Super Guru

I am investigating a good disaster recovery solution for banking with multiple petabytes of data. This would be data in HDFS (parquet, avro), Kafka, Hive and HBase.

Not just the data, but keeping BI tools in sync and having Spark jobs still function.

I have looked at WANDisco, but thats HBase and HDFS. Is there something to keep applications and BI items in sync.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

Super Guru

Also Cloudera has tools for Hive and other Data replication as part of CDP

5 REPLIES 5

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

Super Guru

Cool thanks. We have tried to do the kafka mirroring and that has had a lot of issues. I am thinking NIFI can solve alot of these problems. I think it's a matter of budget. How many nodes of NIFI an extra nodes to help process this data migrating over.

A few people were thinking Dual Ingest, but that is hard to keep in sync usually. With NIFI, that should not be a problem.

I wonder if someone has a DR example in NIFI worked up already?

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

New Contributor

Hello @TimothySpann , 3 years later from your post I'm in a similar situation :). Would you let me know how you solved it? In my company (telco) we're planning using 2 hot clusters with dual ingest because our RTO is demanding and looking for mechanisms to monitor and keep in sync both clusters

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

Super Guru

Cloudera Streams Replication Manager with MirrorMaker 2 solves this easy.

 

But Apache NiFi could do this in a dual ingest fashion, but SRM is a no brainer.  Faster, automatic and Active-Active replication with full monitoring.

 

https://blog.cloudera.com/announcing-the-general-availability-of-cloudera-streams-management/

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

Super Guru

Also Cloudera has tools for Hive and other Data replication as part of CDP

Re: Full Disaster Recovery with Multiple On-Premise Data Centers

New Contributor

Great, thanks a lot for your answers @TimothySpann . SRM seems great and works out-of-the-box! In my case, the proposed architecture is based on 2 hot clusters, each one with their kafka brokers but each one consuming independently from the sources. If primary kafka cluster breaks, secondary kafka cluster has to keep ingesting data from sources, not losing (or minimizing) downtime and loss of data. As far I can see, with SRM if primary kafka cluster breaks there's still the situation where secondary kafka cluster has to ingest and data doesn't have to be lost

Don't have an account?
Coming from Hortonworks? Activate your account here