Distcp failing from HDFS to S3 with length mismatch


Hello everyone,

When performing distcp from hdfs:// to s3a:// after a while I get an error stating something like:

Caused by: Mismatch in length of source:hdfs://clustername/hbase/WALs/,16020,1491913605286/ and target:s3a://bucket-backup/hbase/.distcp.tmp.attempt_local1903592397_0001_m_000000_0

It then quickly fails with:

17/04/11 15:50:54 INFO mapreduce.Job: Job job_local1903592397_0001 failed with state FAILED due to: NA
17/04/11 15:50:54 INFO mapreduce.Job: Counters: 28
        File System Counters
                FILE: Number of bytes read=723868
                FILE: Number of bytes written=764685
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2169097700
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=469
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
                S3A: Number of bytes read=0
                S3A: Number of bytes written=2169097700
                S3A: Number of read operations=471
                S3A: Number of large read operations=0
                S3A: Number of write operations=97
        Map-Reduce Framework
                Map input records=40
                Map output records=0
                Input split bytes=156
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=1376
                Total committed heap usage (bytes)=521142272
        File Input Format Counters
                Bytes Read=13228
        File Output Format Counters
                Bytes Written=8$Counter

Any ideas? We have HBase running on top of this HDFS setup which is performing writes. Is that a problem for distcp?



From that link I see that having open files could be an issue, does this mean I can't backup with distcp (since I'm running Hbase on top and that can never be stopped)? I can't run a copytable to the local filesystem since the data is just too large for that. Are there any other sensible alternatives for backing up to S3?

Maybe @stevel can help here.

Hi @Vasco Pinho did you come right with this distcp? I am also having some errors with distcp.