Reply
Explorer
Posts: 19
Registered: ‎09-22-2016
Accepted Solution

What are the best practices for replicating HDFS and Hive data between clusters?

[ Edited ]

What are the best tools available for DAta replication and their best practices?

 

Thanks in advance.

Highlighted
Posts: 1,825
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: What are the best practices for replicating HDFS and Hive data between clusters?

Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.

This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
Announcements