Created on 11-28-2017 09:00 PM - edited 09-16-2022 01:41 AM
This article helps to perform distcp between 2 clusters.
Here each cluster is Kerbeorized with a different KDC server. And the cross-realm trust is setup between the two MIT KDC servers.
Follow this blog to setup Kerberos cross realm trust setup:
https://community.hortonworks.com/articles/18686/kerberos-cross-realm-trust-for-distcp.html
Once above setup is completed, proceed further.
Add the below property to mapred-site.xml and restart all affected components.
<property> <name>mapreduce.job.send-token-conf</name> <value>yarn.http.policy|^yarn.timeline-service.webapp.*$|^yarn.timelineservice.client.*$|hadoop.security.key.provider.path|hadoop.rpc.protection|dfs.nameservices|^dfs.namenode.rpcaddress.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$|dfs.namenode.kerberos.principal|dfs.namenode.kerberos.principal.pattern </value>
Assuming here 2 clusters one is cluster1 and other is cluster2
Now run the hadoop distcp as below from cluster 1 as below:
$hadoop distcp -Ddfs.client.failover.proxy.provider.cluster2=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider -Ddfs.namenode.rpc-address.cluster.nn2=<<nnrpcaddress>> -Ddfs.namenode.rpc-address.cluster2.nn1=<<nnrpcaddress>> -Ddfs.ha.namenodes.cluster2=nn1,nn2 -Ddfs.nameservices=cluster1,cluster2 hdfs://cluster1/tmp/test hdfs://cluster2/tmp/test