Support Questions

Find answers, ask questions, and share your expertise

Distcp command is not working across clusters

avatar
Explorer

I'm trying to copy the files from one cluster(prod) to another cluster(dev). The files which i'm trying to copy is bucketed and partitioned files from Hive tables in orc format. I'm getting error below.

Source machine is Not Namenode HA and Destination machine is NN HA enabled.

Error: java.io.IOException: File copy failed: hdfs://cluster:8020/user/backup/machineID=XEUS/delta_21551841_21551940/bucket_00003 --> hdfs://CLUSTTDEV:8020/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:299)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:266)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://cluster:8020/user/backup/machineID=XEUS/delta_21551841_21551940/bucket_00003 --> hdfs://CLUSTTDEV:8020/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:296)
... 10 more
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:290)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:250)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:183)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:77)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:285)
... 16 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143.

18/05/10 02:25:30 INFO mapreduce.Job: map 30% reduce 0%
18/05/10 02:25:32 INFO mapreduce.Job: map 32% reduce 0%
18/05/10 02:25:39 INFO mapreduce.Job: map 33% reduce 0%


hadoop distcp -pbugc hdfs://cluster:8020/user/backup/ hdfs://CLUSTTDEV:8020/user/backupdev/

hadoop distcp -skipcrccheck -update hdfs://cluster:8020/user/backup/ hdfs://CLUSTTDEV:8020/user/backupdev/
1 REPLY 1

avatar
Expert Contributor

Error shows there are missing blocks

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-267577882-40.133.26.59-1515787116650:blk_1076168453_2430591 file=/user/backupdev/machineID=XEUS/delta_21551841_21551940/bucket_00003
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:995)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:638)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:77)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:285)
... 16 more

Check Namenode UI to see whether you have missing blocks.