Support Questions
Find answers, ask questions, and share your expertise

Distcp job fails with EOF Exception

Hi,

While running distcp job using hsftp to transfer files from openstack cluster to AWS cluster, sometimes job fails with below exceptions. If we re run multiple times, it is sucessful.

The directory has multiple files ranging from few MBs to max 1GB and total dir size is ~20GB.

Exception 1:

at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:281)
   ... 10 more
Caused
 by: 
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
 javax.net.ssl.SSLException: SSL peer shut down incorrectly
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:288)
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:256)
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:183)
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
   at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
   ... 11 more
Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
   at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:596)
   at sun.security.ssl.InputRecord.read(InputRecord.java:532)
   at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
   at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
   at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
   at sun.net.www.MeteredStream.read(MeteredStream.java:134)
   at java.io.FilterInputStream.read(FilterInputStream.java:133)
   at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:3335)
   at org.apache.commons.io.input.BoundedInputStream.read(BoundedInputStream.java:121)
   at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:229)
   at java.io.DataInputStream.read(DataInputStream.java:100)
   at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:80)
   at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:283)
   ... 16 more

Exception 2:

Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.IOException: Got EOF but currentPos = 336175104 < filelength = 836475643
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:288)
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:256)
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:183)
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123)
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
    at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
    ... 11 more
Caused by: java.io.IOException: Got EOF but currentPos = 336175104 < filelength = 836475643
    at org.apache.hadoop.hdfs.web.ByteRangeInputStream.update(ByteRangeInputStream.java:214)
    at org.apache.hadoop.hdfs.web.ByteRangeInputStream.read(ByteRangeInputStream.java:229)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:80)
    at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:283)
    ... 16 more

4 REPLIES 4

@subacini balakrishnan, DistCp works by distributing the work of copying files across all of the nodes in a cluster. Have you noticed if these failures occur on specific nodes? If so, then it might indicate a misconfiguration or a network connectivity problem on those particular nodes.

@subacini balakrishnan

Is your source and target hadoop versions are same?

Can you paste your distcp command please?

Yes versions are same.

here is the command

hadoop distcp hsftp://host1:50470/user/xxxx/1.txt hdfs://host2_nameservice/user/yyyy

Explorer

@subacini balakrishnan did this work for you?