01-14-2019 05:37 AM
Dear Experts, we have one use case where we have to migrate data (peta bytes) from one cluster to another cluster.
What is strategy and tool I sould consider to migrate ~10 pb of data from one cluster to another cluster.
Many Thanks in advance for your help!
01-16-2019 02:31 PM
What data are we talking about here? HDFS, Hive, HBase, Impala, Search etc? I assume you are not using Cloudera Manager?
01-16-2019 11:49 PM
Yes, I am talking about HDFS, Hive, HBase, Impala, Search etc and also cloudera Manager too from one Cloudera Cluster to Another cloudera cluster
data size in petabytes. Hope this clarifies
01-18-2019 09:02 AM
You can take a look at BDR feature (which should handle HDFS, Hive, Impala) since you are using Cloudera Manager:
For search, you can take a look at this blog:
and public doc:
For HBase, you can take a look at this:
You can also explore the distcp tool which is a general utility for copying large data sets between distributed filesystems within and across clusters.
Thanks and hope this helps,
06-22-2019 08:05 PM
Please suggest me for intial load of TBs and/or PBs of data from Proudcution cluster to DR cluster. Do you suggest to migrate/replicate data using Cloudera BDR?