07-18-2018 01:57 AM - last edited on 07-18-2018 06:19 AM by cjervis
Is there a good place to discuss Disaster Recovery, or do people just go to the relevant forum for each component and ask there?
I am currently trying out BDR and have it working for HDFS (but not Hive).
What seems to be missing is "best-practice" advice. For instance do people set up a BDR replication for the whole of the "/user" directory. If they have users on the target/backup/dr cluster are they stored in a different directory eg "/non-prod/user"
If the two clusters work in an active/active fashion then how do you move job/co-ordinator between Oozie instances?
Do people do DR of Kudu? So far the only options I can see are
a) dual ingest so that we update the primary and DR versions of Kudu at the same time
b) periodically dump all the data stored in Kudu into parquet and then load the parquet into the DR Kudu.
etc etc etc
07-18-2018 11:20 AM
Cloudera Manager provides facilities for replicating data and Hive metadata from one cluster to another. Discussing those topics in this Cloudera Manager message board is perfect.
Best Practise in this space is difficult since there are a great deal of different business needs that can lead to widely varying configurations.
It really depends on what you seek to achieve in replication.
You can set up replication for the /user directory if that is what you want to do.
By default, the Replication Schedule is configured "Keep Deleted Files" in the deletion policy. This means that if a file exists on the target cluster but it does not exist on the source, the file will not be touched. This means you can replicate the data you want form the source but also have other files on the target.
As you mentioned, you can also store your backed-up data in a different subtree by configuring Destination Path.
The clusters involved in replication have no relationship to one another at the CDH level and only loosely, by Peer reference in CM. There is no "Active/Active" database concept here as they are not kept in sync that way.
Data is replicated from source to target. Job coordinators will operate on each cluster without knowledge of one another.
If you are asking how to replciate Oozie and Kudu information, that is not covered in BDR, so I would recommend bringing the questions to the appropriate boards: