I have two Hadoop clusters, ClusterA and ClusterB, each running the same version HDP 2.5.3. I want to merge the two clusters. I have explored the option of distcp ClusterA to ClusterB, but unfortunately I don't have enough spare space in ClusterB to finish the copy. I am thinking if there could be alternative ways like 'reassigning the ClusterA data nodes to Cluster B', without losing the data. After reassigning the original filesystem of ClusterA should be visible in ClusterB.
Any pointers will be appreciated. Thanks in advance!
Hi, @Saurabh Gupta; you can try reduce the number of replication factor, Below information may help you:
1, Modify ClusterB's hdfs replication factor.
vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml ### <property> <name>dfs.replication</name> <value>2</value> </property> ###
2, Run distcp to replicate ClusterA's data into ClusterB
hadoop distctp -Ddfs.replication=2 -update -p hdfs://ClusterA:8020/ hdfs://ClusterB:8020/
3, If ClusterB already has existing data. Can run below command to release the space.
hdfs dfs -setrep -w 2 hdfs://ClusterB:8020/
That may reduce the total size of HDFS data.