Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What are the best practices for replicating HDFS and Hive data between clusters?

SOLVED Go to solution
Highlighted

What are the best practices for replicating HDFS and Hive data between clusters?

Contributor

What are the best tools available for DAta replication and their best practices?

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What are the best practices for replicating HDFS and Hive data between clusters?

Master Guru
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.

This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
1 REPLY 1

Re: What are the best practices for replicating HDFS and Hive data between clusters?

Master Guru
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.

This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html

Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.