Support Questions

Find answers, ask questions, and share your expertise

How do I merge two Hadoop clusters into one?

New Contributor

I have two Hadoop clusters, ClusterA and ClusterB, each running the same version HDP 2.5.3. I want to merge the two clusters. I have explored the option of distcp ClusterA to ClusterB, but unfortunately I don't have enough spare space in ClusterB to finish the copy. I am thinking if there could be alternative ways like 'reassigning the ClusterA data nodes to Cluster B', without losing the data. After reassigning the original filesystem of ClusterA should be visible in ClusterB.

Any pointers will be appreciated. Thanks in advance!



Hi, @Saurabh Gupta; you can try reduce the number of replication factor, Below information may help you:

1, Modify ClusterB's hdfs replication factor.

vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

2, Run distcp to replicate ClusterA's data into ClusterB

hadoop distctp -Ddfs.replication=2 -update -p hdfs://ClusterA:8020/ hdfs://ClusterB:8020/

3, If ClusterB already has existing data. Can run below command to release the space.

hdfs dfs -setrep -w 2 hdfs://ClusterB:8020/

That may reduce the total size of HDFS data.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.