Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Problem when Distcp between two HA Cluster.

Solved Go to solution

Problem when Distcp between two HA Cluster.

New Contributor

Hi, ALL;

I have two hadoop cluster named as cluster_1 and cluster_2 with different zookeepers. Now I want to Distcp hdfs files from cluster_1 to cluster_2.

following as cluster_1's and cluster_2's information.

### cluster_1
1, active namenode: g001.server.edu.tk standby namenode: g002.server.edu.tk
2, zookeeper hosts: g003.server.edu.tk g004.server.edu.tk g005.server.edu.tk
### cluster_2
1, active namenode: d001.server.edu.tk standby namenode: d002.server.edu.tk
2, zookeeper hosts: d003.server.edu.tk d004.server.edu.tk d005.server.edu.tk<br>

1, In order to distcp data from cluster_1 to cluster_2, I copied whole hadoop configuration files from $HADOOP_HOME/etc/hadoop to /configurations/hadoop in cluster_1 and added following properties in hdfs-site.xml in g001.server.org.tk:

<property>
    <name>dfs.nameservices</name>  
    <value>cluster_1,cluster_2</value>
</property>

<property>   
    <name>dfs.client.failover.proxy.provider.cluster_2</name> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property> 
<property>
     <name>dfs.ha.namenodes.cluster_2</name>    
    <value>d001.server.edu.tk,d002.server.edu.tk</value> </property>
<property>
    <name>dfs.namenode.rpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:8020</value>
</property> 

<property> 
    <name>dfs.namenode.servicerpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:54321</value> 
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d002.server.edu.tk</name>
    <value>d002.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d002.server.edu.tk</name>  
    <value>d002.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.cluster_1</name>     
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property> 

<property>
     <name>dfs.ha.namenodes.cluster_1</name> 
    <value>g001.server.edu.tk,g002.server.edu.tk</value>
</property>
<property>   
    <name>dfs.namenode.rpc-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g001.server.edu.tk</name>   
    <value>g001.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50070</value>
</property>
<property> 
    <name>dfs.namenode.https-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_1.g002.server.edu.tk</name>
    <value>g002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:54321</value>
</property>

<property> 
    <name>dfs.namenode.http-address.cluster_1.g002.server.edu.tk</name> 
     <value>g002.server.edu.tk:50070</value>
</property>

<property>
    <name>dfs.namenode.https-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:50470</value>
</property>

2, Then I ran the Distcp command in g001.server.org.tk:

hdfs --config /configurations/hadoop distcp -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

3, But got errors like:

18/10/08 07:55:00 ERROR tools.Distcp: Exception encountered
java.io.IOException: org.apache.hadoop.yarn.exception.YarnException: Failed to submit application_xxx to YARN: Failed to renew token Kind: HDFS_DELEGATION_YOKEN, Service: hdfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
   at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Jobsubmitter.java:240)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject;.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
   at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxx to YARN: Failed to RENEW token: Kind: HDFS_DELEGATION_TOKEN, Service: hfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
   at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:291)
   ... 12 more

4, Hadoop version is 2.7.3

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Problem when Distcp between two HA Cluster.

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2
7 REPLIES 7

Re: Problem when Distcp between two HA Cluster.

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2

Re: Problem when Distcp between two HA Cluster.

New Contributor

Hi, @Predrag Minovic Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk:8020,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

Re: Problem when Distcp between two HA Cluster.

Sorry for hard to understand message, try this:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1/tmp/ hdfs:/cluster_2/tmp/

Note that you don't need port when using NN service name. Also I suggest to copy first a small file or directory in /tmp, like /tmp/mydir1, just create that dir and put a few files inside. Also remove '-update -p' during initial tests. Once it starts working you can try all that.

Re: Problem when Distcp between two HA Cluster.

New Contributor

Hi, @Predrag Minovic; The hadoop version is 2.7.3. It seems property 'mapreduce.job.hdfs-servers.token-renewal.exclude' is not available for hadoop version under 2.8.0. I've tried the method you provided, but got same error.

Re: Problem when Distcp between two HA Cluster.

Hi @Shen Sean

It looks like you may be hitting YARN-3021 - https://issues.apache.org/jira/browse/YARN-3021

Try the same distcp operation after adding in following parameters to distcp command

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=<destinationNN1>,<destinationNN2>

Re: Problem when Distcp between two HA Cluster.

New Contributor

Hi, @Jonathan Sneep Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

Re: Problem when Distcp between two HA Cluster.

New Contributor

4, Hadoop version is 2.7.3

Don't have an account?
Coming from Hortonworks? Activate your account here