quick question about BDR scenario, where there is a prod cluster A and a DR cluster B.
Normal operation is A ==> B (HDFS & Hive replication). Now cluster A requires a shutdown due to maintenance, but outside applications need a cluster to talk to, thereby DR cluster B is being promoted to "prod" while cluster A is undergoing maintenance.
After maintenance is finished, cluster A comes back and needs to be promoted back to "prod".
Since all the data which came into cluster B (while it was the "prod" and cluster A was down) will be replayed to cluster A, the question is:
Is it now possible to simply start up the replication jobs which were in place in the beginning (replicating from A ==> B) and replication will continue to run ? ... although on cluster B the first bunch of data which will get replicated is already there (because it is the data which was ingested during maintenance of cluster A). Means, do the replication jobs simply overwrite (duplicate) data on the target ???
For HDFS, since it is _distcp_ , there is the option to set _overwrite_ explicitly, hence this shouldn't be a problem...What about Hive ?