Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do I merge two Hadoop clusters into one?

How do I merge two Hadoop clusters into one?

New Contributor

I have two Hadoop clusters, ClusterA and ClusterB, each running the same version HDP 2.5.3. I want to merge the two clusters. I have explored the option of distcp ClusterA to ClusterB, but unfortunately I don't have enough spare space in ClusterB to finish the copy. I am thinking if there could be alternative ways like 'reassigning the ClusterA data nodes to Cluster B', without losing the data. After reassigning the original filesystem of ClusterA should be visible in ClusterB.

Any pointers will be appreciated. Thanks in advance!

1 REPLY 1
Highlighted

Re: How do I merge two Hadoop clusters into one?

Explorer

Hi, @Saurabh Gupta; you can try reduce the number of replication factor, Below information may help you:

1, Modify ClusterB's hdfs replication factor.

vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
###
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
###

2, Run distcp to replicate ClusterA's data into ClusterB

hadoop distctp -Ddfs.replication=2 -update -p hdfs://ClusterA:8020/ hdfs://ClusterB:8020/

3, If ClusterB already has existing data. Can run below command to release the space.

hdfs dfs -setrep -w 2 hdfs://ClusterB:8020/

That may reduce the total size of HDFS data.

Don't have an account?
Coming from Hortonworks? Activate your account here