Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

In order to distcp between two HDFS HA cluster (for example A and B), using nameservice id or to setup falcon clusters having namenode ha, these settings are needed.

Assuming nameservice for cluster A and B is HAA and HAB respectively.

One need to set following properties in hdfs-site.xml

Add value of the nameservices of both clusters in dfs.nameservices. This needs to be done in both the clusters.

dfs.nameservices=HAA,HAB

Add property dfs.internal.nameservices

In cluster A: dfs.internal.nameservices = HAA

In cluster B: dfs.internal.nameservices = HAB

Add dfs.ha.namenodes.<nameservice>.

dfs.ha.namenodes.HAB=nn1,nn2

dfs.ha.namenodes.HAA=nn1,nn2

Add property dfs.namenode.rpc-address.<nameservice>.<nn>.

dfs.namenode.rpc-address.HAB.nn1 = <NN1_fqdn>:8020

dfs.namenode.rpc-address.HAB.nn2 = <NN2_fqdn>:8020

dfs.namenode.rpc-address.HAA.nn1 = <NN1_fqdn>:8020

dfs.namenode.rpc-address.HAA.nn2 = <NN2_fqdn>:8020

Add property dfs.client.failover.proxy.provider.<nameservice>

In cluster A dfs.client.failover.proxy.provider.HAB = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

In cluster B dfs.client.failover.proxy.provider.HAA = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Restart HDFS service.

Once complete you will be able to run the distcp command using the nameservice similar to:

hadoop distcp hdfs://HAA/tmp/file1 hdfs://HAB/tmp/

3,566 Views