Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
What are the best practices for replicating HDFS and Hive data between clusters?
Labels:
- Labels:
-
HDFS
Rising Star
Created on ‎10-01-2016 08:45 PM - edited ‎10-01-2016 08:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are the best tools available for DAta replication and their best practices?
Thanks in advance.
1 ACCEPTED SOLUTION
Mentor
Created ‎10-05-2016 12:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.
This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html
Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html
Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
1 REPLY 1
Mentor
Created ‎10-05-2016 12:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3.
This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html
Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html
Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.
