Created on 01-14-2019 05:37 AM - edited 09-16-2022 07:03 AM
Dear Experts, we have one use case where we have to migrate data (peta bytes) from one cluster to another cluster.
What is strategy and tool I sould consider to migrate ~10 pb of data from one cluster to another cluster.
Many Thanks in advance for your help!
Created 01-16-2019 02:31 PM
Hi @xBigDatax,
What data are we talking about here? HDFS, Hive, HBase, Impala, Search etc? I assume you are not using Cloudera Manager?
Thanks,
Li
Li Wang, Technical Solution Manager
Created 01-16-2019 11:49 PM
Yes, I am talking about HDFS, Hive, HBase, Impala, Search etc and also cloudera Manager too from one Cloudera Cluster to Another cloudera cluster
data size in petabytes. Hope this clarifies
Thanks
Created 01-18-2019 09:02 AM
Hi @xBigDatax,
You can take a look at BDR feature (which should handle HDFS, Hive, Impala) since you are using Cloudera Manager:
https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_bdr_about.html
For search, you can take a look at this blog:
https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/
and public doc:
https://www.cloudera.com/documentation/enterprise/6/latest/topics/search_backup_restore.html
For HBase, you can take a look at this:
https://www.cloudera.com/documentation/enterprise/6/latest/topics/cdh_bdr_hbase_replication.html
You can also explore the distcp tool which is a general utility for copying large data sets between distributed filesystems within and across clusters.
https://www.cloudera.com/documentation/enterprise/6/6.1/topics/cdh_admin_distcp_cdh.html
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 06-22-2019 08:05 PM
Please suggest me for intial load of TBs and/or PBs of data from Proudcution cluster to DR cluster. Do you suggest to migrate/replicate data using Cloudera BDR?
Thanks