Support Questions

Fawzea · ‎06-29-2016

Hi all,

I'm planning to migare from CDH4 to CDH5 and i'm using DistCp to copy the historical data between the 2 cluster, my problem that each file in CDH4 HDFS exceeds 150 GB and the nodes with 1G network card, the DistCp failed with such error:

Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: Got EOF but currentPos = 77828481024 < filelength = 119488762721
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:289)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:257)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:184)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:124)
at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:100)
at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more

Almost sure the issue is the network Card, but replacing the network card for 120 Nodes isn't easy task.

As i know the distcp copy files per mapper, is there away to copy a block per mapper? is there a way to split the files and re merge then after the copy ( want to preserver the file name after the merge also).

such issue is limiting my migration to CDH5, hope you can help.

Harsh J · ‎07-06-2016

Are you using hftp:// or webhdfs://? I'd recommend trying with the latter.

For this specific exception in REST based copies, its usually not a fault with the network but a buggy state in the older Jetty used on the source cluster. Typically a rolling restart of the DataNodes will help resolve such a bad state of Jetty where it hangs up on a client mid-way during a response, causing the sudden EOF to the copying client in DistCp when it was expecting the rest of data.

View solution in original post

Fawzea · ‎07-05-2016

Hi,

Can anyone helpout.

This step block my migration from CDH4.

Harsh J · ‎07-06-2016

Are you using hftp:// or webhdfs://? I'd recommend trying with the latter.

For this specific exception in REST based copies, its usually not a fault with the network but a buggy state in the older Jetty used on the source cluster. Typically a rolling restart of the DataNodes will help resolve such a bad state of Jetty where it hangs up on a client mid-way during a response, causing the sudden EOF to the copying client in DistCp when it was expecting the rest of data.

Tee · ‎07-30-2018

The webhdfs did the trick.

pra_big · ‎01-04-2019

i have used the below command to copy 36TB to blob using snaoshot.
HADOOP_CLIENT_OPTS="-Xmx40G" hadoop distcp -update -delete $SNAPSHOT_PATH wasbs://buclusterbackup@blobplatformdataxe265ecb.blob.core.windows.net/sep_backup/application_data
getting Azure exception errors and Java IO error.
i re ran with -skipcrccheck still the same error.

Harsh J · ‎07-06-2016

Block-level copies (with file merges) is not supported as a DistCp feature yet.

However, you can use the -update options to do progressive copies - resuming upon the last failure.

TheKishore432 · ‎07-07-2016

Hi Harsh,

Please correct me if I am wrong we recently tested this in our environment, during copy I observed data will be copied to temporary directory and latter will be merged by default to the destination path.

Thanks

Kishore

Harsh J · ‎07-07-2016

Copy is done to a temporary file, and then moved to the actual destination
upon completion. There's no "merge", only move. This procedure is done to
ensure partial file copies don't get leftover if the job fails or gets
killed.

TheKishore432 · ‎07-07-2016

Got it.

Fawzea · ‎07-09-2016

Thanks Harsh.

Indeed i used -update before but it didn't solve the issue, and since i
have 120 cluster nodes, restarting the datanodes wasn't a feasible solution.

What solved the issue was using webhdfs instead of hftp.

Cloudera Community

Support Questions

Handling Distcp of large files between 2 clusters