- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to copy data from a non HA cluster to a HA cluster?
Created on ‎10-20-2022 12:13 PM - edited ‎10-20-2022 12:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have cluster 1 (the source cluster in this example) which does not have HA enabled, and I have cluster 2 (destination cluster in this example) which does have HA enabled. Both clusters are running the same version of HDP. When I attempt to copy data from cluster 1 to cluster 2 via distcp using the active name node instead of the HA nameservice, it works but ultimately causes problems when the active name node switches. I have tried using the HA nameservice, but without any configurations, it fails (as one would expect). When modifying the hdfs-site.xml to add cluster 2's nameservice information (following this guide laid out here but only for one HA cluster: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/scaling-namespaces/topics/hdfs-distcp-between...), the HDFS services on cluster 1 fail to restart, as the name nodes appear to be attempting to join cluster 2. Is this sort of configuration even possible, or do we need HA on both clusters? It seemed to me just providing the information regarding the external nameservice shouldn't have had as significant of an impact as it did, but I think I'm just missing something.
Created ‎10-20-2022 12:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @djtapl01,
You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.
You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath
Thank you.
Created ‎10-20-2022 12:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @djtapl01,
You could try performing reverse distcp. Cluster 1 is non-HA cluster and will always have same Active Namenode. When you run command on cluster2, hdfs will identify the internal nameservice.
You could try running distcp on the cluster 2 with below syntax
cluster2_node # hadoop distcp hdfs://<ANN_cluster1>:<port>/<filepath> hdfs://<nameservice_cluster2>/filepath
Thank you.
