Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

HDFS distcp between clusters error

Highlighted

HDFS distcp between clusters error

New Contributor

Hi,

 

I have two isolated clusters with gateway nodes each (only gateway has public IP).
I want to copy data from one cluster to other.
I configured access with haproxy and iptables.
Clusters see each other but when I use distcp I get an error:

 

shamrock@hadoop-cluster-src-gw:~$ hadoop distcp hdfs://hadoop-cluster-src-gw.some.domain/user/shamrock/test.pl hdfs://hadoop-cluster-dst-gw.some.domain/user/shamrock/
16/06/14 07:29:00 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoop-cluster-src-gw.some.domain/user/shamrock/test.pl], targetPath=hdfs://hadoop-cluster-dst-gw.some.domain/user/shamrock, targetPathExists=true, preserveRawXattrs=false, filtersFile='null'}
16/06/14 07:29:00 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/06/14 07:29:00 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/06/14 07:29:00 ERROR tools.DistCp: Exception encountered
java.io.IOException: Mkdirs failed to create file:/user/shamrock1794299338/.staging/_distcp-1567163966 (exists=false, cwd=file:/home/shamrock)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:917)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1072)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:271)
at org.apache.hadoop.tools.SimpleCopyListing.getWriter(SimpleCopyListing.java:407)
at org.apache.hadoop.tools.SimpleCopyListing.doBuildListing(SimpleCopyListing.java:174)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:365)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:171)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)

 

What is wrong with configuration that distcp is trying to create fake directory instead using right one ?
Why hadoop is trying to create directory with such weired name ?

 

Best regards,

 

Shamrock