Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to copy data from a non HA cluster to a HA cluster?

avatar
New Contributor

I have cluster 1 (the source cluster in this example) which does not have HA enabled, and I have cluster 2 (destination cluster in this example) which does have HA enabled.  Both clusters are running the same version of HDP.  When I attempt to copy data from cluster 1 to cluster 2 via distcp using the active name node instead of the HA nameservice, it works but ultimately causes problems when the active name node switches.  I have tried using the HA nameservice, but without any configurations, it fails (as one would expect).  When modifying the hdfs-site.xml to add cluster 2's nameservice information (following this guide laid out here but only for one HA cluster: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-distcp-between...), the HDFS services on cluster 1 fail to restart, as the name nodes appear to be attempting to join cluster 2.  Is this sort of configuration even possible, or do we need HA on both clusters?  It seemed to me just providing the information regarding the external nameservice shouldn't have had as significant of an impact as it did, but I think I'm just missing something.

1 ACCEPTED SOLUTION

avatar
Rising Star

Hi @djtapl01,

You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.

You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath

Thank you.

View solution in original post

1 REPLY 1

avatar
Rising Star

Hi @djtapl01,

You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.

You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath

Thank you.