Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to copy (distcp) between two HA HDFS clusters


How to copy (distcp) between two HA HDFS clusters




The scenario is the following I have  cluster1 with HDFS HA enabled 

and I want to copy data to cluster2 with HA enabled as well. 


It seems I need to know the active NameNode to do that. 


The recommendation I've seen is to update hdfs-site.xml on the cluster 



But that seems like it pollutes  the cluster and also it's potentially hard  to maintain as we would need to update those if we change the NameNode's topology of one of the cluster at some point. 


Is there not some kind of autodiscovery mechanism. A lot of HA applications for example can specify all nodes 

e.g hdfs://node1,node2:/path/to/x 

or hdfs-zookeeper://zookeeper-address:/path/to/x






Re: How to copy (distcp) between two HA HDFS clusters

New Contributor

Re: How to copy (distcp) between two HA HDFS clusters


That's the sort of thing I want to avoid -

There are like 9 options that need to be copied and maintained on the remote node. 

If I have an application that should be deployed on random hadoop cluster and work with another random hadoop cluster I'd rather not have to pass as arguments 9 options to get hdfs between the clusters working 

And if I want the application to work on random non-hadoop node then I'd need to pass 9*2 =18 properties. 





Don't have an account?
Coming from Hortonworks? Activate your account here