Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What are the best practices for replicating HDFS and Hive data between clusters?

avatar
Rising Star

What are the best tools available for DAta replication and their best practices?

 

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Mentor
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.

This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.

View solution in original post

1 REPLY 1

avatar
Mentor
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.

This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.