The scenario is the following I have cluster1 with HDFS HA enabled
and I want to copy data to cluster2 with HA enabled as well.
It seems I need to know the active NameNode to do that.
The recommendation I've seen is to update hdfs-site.xml on the cluster
But that seems like it pollutes the cluster and also it's potentially hard to maintain as we would need to update those if we change the NameNode's topology of one of the cluster at some point.
Is there not some kind of autodiscovery mechanism. A lot of HA applications for example can specify all nodes
That's the sort of thing I want to avoid - https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cdh_admin_distcp_data_cluster_migrat... -
There are like 9 options that need to be copied and maintained on the remote node.
If I have an application that should be deployed on random hadoop cluster and work with another random hadoop cluster I'd rather not have to pass as arguments 9 options to get hdfs between the clusters working
And if I want the application to work on random non-hadoop node then I'd need to pass 9*2 =18 properties.