Reply
Explorer
Posts: 10
Registered: ‎07-04-2016

distcp with same nameservicename

Hi ,

 

Is distcp possible between two kerberised clusters having HA enabled with same nameservce name?

 

Please share pre-requisites if anybody is successful doing it.

 

Tried setting params runtime but received 'java.io.IOException: Failed to run job : Failed to renew token: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:remote-hdfs'

 

Regards,

Leo

Cloudera Employee
Posts: 14
Registered: ‎10-07-2015

Re: distcp with same nameservicename

Assuming all the Kerberos trust relationships are setup correctly, it is possible to copy between two secured HA clusters with the same namespace. The following documentation talks about the Kerberos configuration required:

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_admin_distcp_data_cluster_migrat...

 

The easiest way to do the copy, is to identify the active namenode of the target cluster, and run the command like:

 

hadoop distcp /path/to/source/file hdfs://namenode_host:8020/destination_path

 

Ie, intead of using the nameservice name, use the actual hostname of the active namenode.

Explorer
Posts: 10
Registered: ‎07-04-2016

Re: distcp with same nameservicename

Hi ,

 

Distcp works with active namenode hostname instead of nameservice.This was already tested.

 

I would like to know how to use same nameservice names for both source and destination .They are in same realm.We would like to mention nameservice in code instead of hard coding hostname so that it will work even if one NN is down or fialover happen.

 

Regards,

Leo.

Highlighted
Cloudera Employee
Posts: 14
Registered: ‎10-07-2015

Re: distcp with same nameservicename

Hi Leo,

 

Ok - its good that the Kerberos setup is all working, as that can be the hard part!

 

If you want to copy files such that you run the command on cluster1 then you will need to modify the cluster1 hdfs-site.xml to hold the information about the other cluster. To do this, you can list a new nameservice in the hdfs-site.xml on cluster1 that points at the Namenode hosts of the other cluster.

 

The nameservice on cluster1 is nameservice1. Lets create another nameservice called backupcluster. In the hdfs-site.xml on cluster1 add properties for the following:

 

dfs.ha.namenodes.backupcluster=nn1,nn2
dfs.namenode.rpc-address.backupcluster.nn1=cluster2_nn_host1:8020
dfs.namenode.rpc-address.backupcluster.nn2=cluster2_nn_host2:8020
dfs.client.failover.proxy.provider.backupcluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

 

Assuming you are using CM, you can edit the /etc/hadoop/conf/hdfs-site.xml on your gateway node to test this out, but afterwards you should add the settings to the CM Cluster wide safety value for hdfs-site.xml.

 

If the above settings worked, from cluster1, you should be able to run:

hadoop fs -ls hdfs://backupcluster/some/path

 

If that works, you can try distcp from cluster1:

 

hadoop distcp hdfs://nameservice1/path/on/cluster1 hdfs://backupcluster/target/path

 

 

Announcements