Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Distcp command is not working across clusters

Highlighted

Distcp command is not working across clusters

New Contributor

I'm trying to copy the files from one cluster(prod) to another cluster(dev). The files which i'm trying to copy is bucketed and partitioned files from Hive tables in orc format. I'm getting error below.

Source machine is Not Namenode HA and Destination machine is NN HA enabled.

Error: java.io.IOException: File copy failed: hdfs://cluster:8020/user/backup/machineID=XEUS/delta_21551841_21551940/bucket_00003 --> hdfs://CLUSTTDEV:8020/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:299)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:266)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://cluster:8020/user/backup/machineID=XEUS/delta_21551841_21551940/bucket_00003 --> hdfs://CLUSTTDEV:8020/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:296)
... 10 more
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:290)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:250)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:183)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:77)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:285)
... 16 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.

18/05/10 02:25:30 INFO mapreduce.Job: map 30% reduce 0%
18/05/10 02:25:32 INFO mapreduce.Job: map 32% reduce 0%
18/05/10 02:25:39 INFO mapreduce.Job: map 33% reduce 0%


hadoop distcp -pbugc hdfs://cluster:8020/user/backup/ hdfs://CLUSTTDEV:8020/user/backupdev/

hadoop distcp -skipcrccheck -update hdfs://cluster:8020/user/backup/ hdfs://CLUSTTDEV:8020/user/backupdev/
1 REPLY 1

Re: Distcp command is not working across clusters

Expert Contributor

Error shows there are missing blocks

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:77)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:285)
... 16 more

Check Namenode UI to see whether you have missing blocks.