Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Problem when Distcp between two HA Cluster.

avatar
New Member

Hi, ALL;

I have two hadoop cluster named as cluster_1 and cluster_2 with different zookeepers. Now I want to Distcp hdfs files from cluster_1 to cluster_2.

following as cluster_1's and cluster_2's information.

### cluster_1
1, active namenode: g001.server.edu.tk standby namenode: g002.server.edu.tk
2, zookeeper hosts: g003.server.edu.tk g004.server.edu.tk g005.server.edu.tk
### cluster_2
1, active namenode: d001.server.edu.tk standby namenode: d002.server.edu.tk
2, zookeeper hosts: d003.server.edu.tk d004.server.edu.tk d005.server.edu.tk<br>

1, In order to distcp data from cluster_1 to cluster_2, I copied whole hadoop configuration files from $HADOOP_HOME/etc/hadoop to /configurations/hadoop in cluster_1 and added following properties in hdfs-site.xml in g001.server.org.tk:

<property>
    <name>dfs.nameservices</name>  
    <value>cluster_1,cluster_2</value>
</property>

<property>   
    <name>dfs.client.failover.proxy.provider.cluster_2</name> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property> 
<property>
     <name>dfs.ha.namenodes.cluster_2</name>    
    <value>d001.server.edu.tk,d002.server.edu.tk</value> </property>
<property>
    <name>dfs.namenode.rpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:8020</value>
</property> 

<property> 
    <name>dfs.namenode.servicerpc-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:54321</value> 
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d001.server.edu.tk</name> 
    <value>d001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_2.d002.server.edu.tk</name> 
    <value>d002.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_2.d002.server.edu.tk</name>
    <value>d002.server.edu.tk:50070</value>
</property>

<property> 
    <name>dfs.namenode.https-address.cluster_2.d002.server.edu.tk</name>  
    <value>d002.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.cluster_1</name>     
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property> 

<property>
     <name>dfs.ha.namenodes.cluster_1</name> 
    <value>g001.server.edu.tk,g002.server.edu.tk</value>
</property>
<property>   
    <name>dfs.namenode.rpc-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g001.server.edu.tk</name>   
    <value>g001.server.edu.tk:54321</value>
</property>

<property>
    <name>dfs.namenode.http-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50070</value>
</property>
<property> 
    <name>dfs.namenode.https-address.cluster_1.g001.server.edu.tk</name> 
    <value>g001.server.edu.tk:50470</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.cluster_1.g002.server.edu.tk</name>
    <value>g002.server.edu.tk:8020</value>
</property>

<property>
    <name>dfs.namenode.servicerpc-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:54321</value>
</property>

<property> 
    <name>dfs.namenode.http-address.cluster_1.g002.server.edu.tk</name> 
     <value>g002.server.edu.tk:50070</value>
</property>

<property>
    <name>dfs.namenode.https-address.cluster_1.g002.server.edu.tk</name> 
    <value>g002.server.edu.tk:50470</value>
</property>

2, Then I ran the Distcp command in g001.server.org.tk:

hdfs --config /configurations/hadoop distcp -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

3, But got errors like:

18/10/08 07:55:00 ERROR tools.Distcp: Exception encountered
java.io.IOException: org.apache.hadoop.yarn.exception.YarnException: Failed to submit application_xxx to YARN: Failed to renew token Kind: HDFS_DELEGATION_YOKEN, Service: hdfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
   at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Jobsubmitter.java:240)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject;.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
   at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxx to YARN: Failed to RENEW token: Kind: HDFS_DELEGATION_TOKEN, Service: hfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs)
   at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
   at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290)
   at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:291)
   ... 12 more

4, Hadoop version is 2.7.3

1 ACCEPTED SOLUTION

avatar
Master Guru

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2

View solution in original post

7 REPLIES 7

avatar
Master Guru

Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2

avatar
New Member

Hi, @Predrag Minovic Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk:8020,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/


hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

avatar
Master Guru

Sorry for hard to understand message, try this:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1/tmp/ hdfs:/cluster_2/tmp/

Note that you don't need port when using NN service name. Also I suggest to copy first a small file or directory in /tmp, like /tmp/mydir1, just create that dir and put a few files inside. Also remove '-update -p' during initial tests. Once it starts working you can try all that.

avatar
New Member

Hi, @Predrag Minovic; The hadoop version is 2.7.3. It seems property 'mapreduce.job.hdfs-servers.token-renewal.exclude' is not available for hadoop version under 2.8.0. I've tried the method you provided, but got same error.

avatar

Hi @Shen Sean

It looks like you may be hitting YARN-3021 - https://issues.apache.org/jira/browse/YARN-3021

Try the same distcp operation after adding in following parameters to distcp command

-Dmapreduce.job.hdfs-servers.token-renewal.exclude=<destinationNN1>,<destinationNN2>

avatar
New Member

Hi, @Jonathan Sneep Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:

hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/

avatar
New Member

4, Hadoop version is 2.7.3