Created 10-09-2018 03:52 AM
Hi, ALL;
I have two hadoop cluster named as cluster_1 and cluster_2 with different zookeepers. Now I want to Distcp hdfs files from cluster_1 to cluster_2.
following as cluster_1's and cluster_2's information.
### cluster_1 1, active namenode: g001.server.edu.tk standby namenode: g002.server.edu.tk 2, zookeeper hosts: g003.server.edu.tk g004.server.edu.tk g005.server.edu.tk ### cluster_2 1, active namenode: d001.server.edu.tk standby namenode: d002.server.edu.tk 2, zookeeper hosts: d003.server.edu.tk d004.server.edu.tk d005.server.edu.tk<br>
1, In order to distcp data from cluster_1 to cluster_2, I copied whole hadoop configuration files from $HADOOP_HOME/etc/hadoop to /configurations/hadoop in cluster_1 and added following properties in hdfs-site.xml in g001.server.org.tk:
<property> <name>dfs.nameservices</name> <value>cluster_1,cluster_2</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster_2</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.cluster_2</name> <value>d001.server.edu.tk,d002.server.edu.tk</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_2.d001.server.edu.tk</name> <value>d001.server.edu.tk:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.cluster_2.d001.server.edu.tk</name> <value>d001.server.edu.tk:54321</value> </property> <property> <name>dfs.namenode.http-address.cluster_2.d001.server.edu.tk</name> <value>d001.server.edu.tk:50070</value> </property> <property> <name>dfs.namenode.https-address.cluster_2.d001.server.edu.tk</name> <value>d001.server.edu.tk:50470</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_2.d002.server.edu.tk</name> <value>d002.server.edu.tk:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.cluster_2.d002.server.edu.tk</name> <value>d002.server.edu.tk:54321</value> </property> <property> <name>dfs.namenode.http-address.cluster_2.d002.server.edu.tk</name> <value>d002.server.edu.tk:50070</value> </property> <property> <name>dfs.namenode.https-address.cluster_2.d002.server.edu.tk</name> <value>d002.server.edu.tk:50470</value> </property> <property> <name>dfs.client.failover.proxy.provider.cluster_1</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.namenodes.cluster_1</name> <value>g001.server.edu.tk,g002.server.edu.tk</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_1.g001.server.edu.tk</name> <value>g001.server.edu.tk:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.cluster_1.g001.server.edu.tk</name> <value>g001.server.edu.tk:54321</value> </property> <property> <name>dfs.namenode.http-address.cluster_1.g001.server.edu.tk</name> <value>g001.server.edu.tk:50070</value> </property> <property> <name>dfs.namenode.https-address.cluster_1.g001.server.edu.tk</name> <value>g001.server.edu.tk:50470</value> </property> <property> <name>dfs.namenode.rpc-address.cluster_1.g002.server.edu.tk</name> <value>g002.server.edu.tk:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.cluster_1.g002.server.edu.tk</name> <value>g002.server.edu.tk:54321</value> </property> <property> <name>dfs.namenode.http-address.cluster_1.g002.server.edu.tk</name> <value>g002.server.edu.tk:50070</value> </property> <property> <name>dfs.namenode.https-address.cluster_1.g002.server.edu.tk</name> <value>g002.server.edu.tk:50470</value> </property>
2, Then I ran the Distcp command in g001.server.org.tk:
hdfs --config /configurations/hadoop distcp -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
3, But got errors like:
18/10/08 07:55:00 ERROR tools.Distcp: Exception encountered java.io.IOException: org.apache.hadoop.yarn.exception.YarnException: Failed to submit application_xxx to YARN: Failed to renew token Kind: HDFS_DELEGATION_YOKEN, Service: hdfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(Jobsubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject;.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:183) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_xxxx to YARN: Failed to RENEW token: Kind: HDFS_DELEGATION_TOKEN, Service: hfs:cluster_2, Ident: (HDFS_DELEGATION_TOKEN token 50168 for hdfs) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271) at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:290) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:291) ... 12 more
4, Hadoop version is 2.7.3
Created 10-09-2018 03:52 AM
Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:
-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2
Created 10-09-2018 03:52 AM
Use this, and server name and port if you are doing distcp directly to the active NN on remote cluster:
-Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2
Created 10-09-2018 03:52 AM
Hi, @Predrag Minovic Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/ hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk:8020,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/ hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/ hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/ hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2:8020 -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
Created 10-09-2018 09:06 AM
Sorry for hard to understand message, try this:
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=cluster_2 -update -p hdfs://cluster_1/tmp/ hdfs:/cluster_2/tmp/
Note that you don't need port when using NN service name. Also I suggest to copy first a small file or directory in /tmp, like /tmp/mydir1, just create that dir and put a few files inside. Also remove '-update -p' during initial tests. Once it starts working you can try all that.
Created 10-10-2018 02:07 AM
Hi, @Predrag Minovic; The hadoop version is 2.7.3. It seems property 'mapreduce.job.hdfs-servers.token-renewal.exclude' is not available for hadoop version under 2.8.0. I've tried the method you provided, but got same error.
Created 10-09-2018 03:52 AM
Hi @Shen Sean
It looks like you may be hitting YARN-3021 - https://issues.apache.org/jira/browse/YARN-3021
Try the same distcp operation after adding in following parameters to distcp command
-Dmapreduce.job.hdfs-servers.token-renewal.exclude=<destinationNN1>,<destinationNN2>
Created 10-09-2018 03:52 AM
Hi, @Jonathan Sneep Thank you for your reply. I tried the method you suggest but got the same error. Below is the command what I ran:
hdfs --config /configurations/hadoop distcp -Dmapreduce.job.hdfs-servers.token-renewal.exclude=d001.server.edu.tk,d002.server.edu.tk -update -p hdfs://cluster_1:8020/tmp/ hdfs:/cluster_2:8020/tmp/
Created 10-09-2018 03:52 AM
4, Hadoop version is 2.7.3