Member since
05-20-2016
2
Posts
1
Kudos Received
0
Solutions
01-03-2018
02:04 PM
1 Kudo
The simple answer is to open up the ports in a bidirectional manner on all the hosts. For instance: on each node in cluster A: Allow connectivity to 1004 (or 50010 without Kerberos) and 50020 on each datanode in cluster B. As well as 8020 to namenodes in Cluster B. on each node in cluster B: Allow connectivity to 1004 (or 50010 without Kerberos) and 50020 on each datanode in cluster A. As well as 8020 to namenodes in Cluster A. However... You are right, where the distcp is executed will determine the source/destination. Executing distcp on Cluster A will cause a mapreduce job to run on cluster A. Each datanode will(may) run a task that will connect to the namenode(s) on cluster B for block locations and then datanodes on cluster B for transfer. I'm not sure if the node the distcp is executed on will need access as well. So I generally run the distcp on one of the datanodes.
... View more