Created on 10-20-2022 12:13 PM - edited 10-20-2022 12:14 PM
I have cluster 1 (the source cluster in this example) which does not have HA enabled, and I have cluster 2 (destination cluster in this example) which does have HA enabled. Both clusters are running the same version of HDP. When I attempt to copy data from cluster 1 to cluster 2 via distcp using the active name node instead of the HA nameservice, it works but ultimately causes problems when the active name node switches. I have tried using the HA nameservice, but without any configurations, it fails (as one would expect). When modifying the hdfs-site.xml to add cluster 2's nameservice information (following this guide laid out here but only for one HA cluster: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-distcp-between...), the HDFS services on cluster 1 fail to restart, as the name nodes appear to be attempting to join cluster 2. Is this sort of configuration even possible, or do we need HA on both clusters? It seemed to me just providing the information regarding the external nameservice shouldn't have had as significant of an impact as it did, but I think I'm just missing something.
Created 10-20-2022 12:55 PM
Hi @djtapl01,
You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.
You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath
Thank you.
Created 10-20-2022 12:55 PM
Hi @djtapl01,
You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.
You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath
Thank you.