Community Articles

Find and share helpful community-sourced technical articles.
avatar
Contributor

This article helps to perform distcp between 2 clusters.

Here each cluster is Kerbeorized with a different KDC server. And the cross-realm trust is setup between the two MIT KDC servers.

Follow this blog to setup Kerberos cross realm trust setup:

https://community.hortonworks.com/articles/18686/kerberos-cross-realm-trust-for-distcp.html

Once above setup is completed, proceed further.

Add the below property to mapred-site.xml and restart all affected components.

<property>	  

  <name>mapreduce.job.send-token-conf</name>

  <value>yarn.http.policy|^yarn.timeline-service.webapp.*$|^yarn.timelineservice.client.*$|hadoop.security.key.provider.path|hadoop.rpc.protection|dfs.nameservices|^dfs.namenode.rpcaddress.*$|^dfs.ha.namenodes.*$|^dfs.client.failover.proxy.provider.*$|dfs.namenode.kerberos.principal|dfs.namenode.kerberos.principal.pattern
	</value>

Assuming here 2 clusters one is cluster1 and other is cluster2

Now run the hadoop distcp as below from cluster 1 as below:

$hadoop distcp -Ddfs.client.failover.proxy.provider.cluster2=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider -Ddfs.namenode.rpc-address.cluster.nn2=<<nnrpcaddress>> -Ddfs.namenode.rpc-address.cluster2.nn1=<<nnrpcaddress>> -Ddfs.ha.namenodes.cluster2=nn1,nn2 -Ddfs.nameservices=cluster1,cluster2 hdfs://cluster1/tmp/test hdfs://cluster2/tmp/test
2,943 Views