Support Questions
Find answers, ask questions, and share your expertise

Using Distcp between from no a kerberized cluster to a kerberized cluster

Expert Contributor

I need to move some data from a no kerberized cluster to a kerberized cluster, and I have to deal with a kerberization problems.

This is a first view of what I got on output of the distcp command:

#hadoop distcp hdfs://****hnn01:8020/tmp/elements hdfs://****hmn31:8020/tmp18/02/23 10:51:21 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException: java.net.UnknownHostException: ****hmn31 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Caused by: java.net.UnknownHostException: ****hmn31
        ... 16 more Invalid arguments: java.net.UnknownHostException: ****hmn31

also there is the hdp version problem : I would move data from 2.3 to 2.5 hdp.

any suggestion will be greatly appreciated

1 REPLY 1

Re: Using Distcp between from no a kerberized cluster to a kerberized cluster

@Yassine

Did get chance to check below URL:

1. Added the following property to the core-site.xml on the client machine (secure cluster):

<property> 
<name>ipc.client.fallback-to-simple-auth-allowed</name> 
<value>true</value>
</property>

2. Added the following rule to the core-site.xml on the source cluster (non-secure)

RULE:[1:$1@$0](.*@<replace with REALM name>)s/@.*// 

After adding the above properties you should be able to perform distcp successfully.

Link: https://community.hortonworks.com/questions/294/running-distcp-between-two-cluster-one-kerberized.ht...

Hope this helps you.