Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Problem when Distcp between two HA Cluster.

avatar
Contributor

Hi, ALL;

I have two hadoop cluster named as cluster_1 and cluster_2 with different zookeepers. Now I want to Distcp hdfs files from cluster_1 to cluster_2.

following as cluster_1's and cluster_2's information.

### cluster_1
1, active namenode: g001.server.edu.tk standby namenode: g002.server.edu.tk
2, zookeeper hosts: g003.server.edu.tk g004.server.edu.tk g005.server.edu.tk
### cluster_2
1, active namenode: d001.server.edu.tk standby namenode: d002.server.edu.tk
2, zookeeper hosts: d003.server.edu.tk d004.server.edu.tk d005.server.edu.tk<br>

1, In order to distcp data from cluster_1 to cluster_2, I copied whole hadoop configuration files from $HADOOP_HOME/etc/hadoop to /configurations/hadoop in cluster_1 and added following properties in hdfs-site.xml in g001.server.org.tk:

<property>
    <name>dfs.nameservices</name>  
    <value>cluster_1,cluster_2</value>
</property>

<property>   
    <name>dfs.client.failover.proxy.provider.cluster_2</name> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property> 
<property>
     <name>dfs.ha.namenodes.cluster_2</name>    
    <value>d001.server.edu.tk,d002.server.edu.tk</value> </property>
<property>
    <name>dfs.namenode.rpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:8020</value>
</property> 

<property> 
    <name>dfs.namenode.servicerpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:54321</value> 
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d002.server.edu.tk</name>
    <value>d002.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d002.server.edu.tk</name>  
    <value>d002.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.cluster_1</name>     
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property> 

<property>
     <name>dfs.ha.namenodes.cluster_1</name> 
    <value>g001.server.edu.tk,g002.server.edu.tk</value>
</property>
<property>   
    <name>dfs.namenode.rpc-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g001.server.edu.tk</name>   
    <value>g001.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50070</value>
</property>
<property> 
    <name>dfs.namenode.https-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_1.g002.server.edu.tk</name>
    <value>g002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:54321</value>
</property>

<property> 
    <name>dfs.namenode.http-address.cluster_1.g002.server.edu.tk</name> 
     <value>g002.server.edu.tk:50070</value>
</property>

<property>
    <name>dfs.namenode.https-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:50470</value>
</property>

2, Then I ran the Distcp command in g001.server.org.tk:

hdfs --config /configurations/hadoop distcp -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

3, But got errors like:

18/10/08 07:55:00 ERROR tools.Distcp: Exception encountered
java.io.IOException: org.apache.hadoop.yarn.exception.YarnException: Failed to submit application_xxx to YARN: Failed to renew token Kind: HDFS_DELEGATION_YOKEN, Service: hdfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
   at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Jobsubmitter.java:240)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject;.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
   at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxx to YARN: Failed to RENEW token: Kind: HDFS_DELEGATION_TOKEN, Service: hfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
   at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:291)
   ... 12 more

4, Hadoop version is 2.7.3

1 ACCEPTED SOLUTION

avatar
Master Guru

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2

View solution in original post

7 REPLIES 7

avatar
Master Guru

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2

avatar
Contributor

Hi, @Predrag Minovic Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk:8020,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

avatar
Master Guru

Sorry for hard to understand message, try this:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1/tmp/ hdfs:/cluster_2/tmp/

Note that you don't need port when using NN service name. Also I suggest to copy first a small file or directory in /tmp, like /tmp/mydir1, just create that dir and put a few files inside. Also remove '-update -p' during initial tests. Once it starts working you can try all that.

avatar
Contributor

Hi, @Predrag Minovic; The hadoop version is 2.7.3. It seems property 'mapreduce.job.hdfs-servers.token-renewal.exclude' is not available for hadoop version under 2.8.0. I've tried the method you provided, but got same error.

avatar

Hi @Shen Sean

It looks like you may be hitting YARN-3021 - https://issues.apache.org/jira/browse/YARN-3021

Try the same distcp operation after adding in following parameters to distcp command

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=<destinationNN1>,<destinationNN2>

avatar
Contributor

Hi, @Jonathan Sneep Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

avatar
Contributor

4, Hadoop version is 2.7.3