Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Distcp ErrorMessage "Duplicate files in input path"

Distcp ErrorMessage "Duplicate files in input path"

Rising Star

I am making a disctp between non secure clusters and I've made it works, the problem is when running below command:

hadoop distcp -overwrite -pb <Snapshot_Dir/Snapshot_name/* <Target_DIR>

since now I am playing with options. I am getting mentioned error (check complete message below):

18/10/10 11:12:42 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=true, append=false, useDiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://server/DIR/.snapshot/s20181010-085735.478/*], targetPath=hdfs://server/DIR, targetPathExists=true, filtersFile='null', verboseLog=false}
18/10/10 11:12:42 INFO client.AHSProxy: Connecting to Application History server at server
18/10/10 11:12:43 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 4478; dirCnt = 0
18/10/10 11:12:43 INFO tools.SimpleCopyListing: Build file listing completed.
18/10/10 11:12:43 ERROR tools.DistCp: Duplicate files in input path:
org.apache.hadoop.tools.CopyListing$DuplicateFileException: File hdfs://source_server/DIR.snapshot/s20181010-085735.478/2018-07-23/000000_0 and hdfs://source_server/DIR/.snapshot/s20181010-085735.478/dt=2018-07-16/000000_0 would cause duplicates. Aborting
        at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:165)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93)
        at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
        at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
        at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:398)
        at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:190)
        at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
        at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)

I've also tried -update and same result. Does anyone who has fix this?

Regards!

Don't have an account?
Coming from Hortonworks? Activate your account here