When performing distcp from hdfs:// to s3a:// after a while I get an error stating something like:
Caused by: java.io.IOException: Mismatch in length of source:hdfs://clustername/hbase/WALs/hdp-data1.example.com,16020,1491913605286/hdp-data1.example.com%2C16020%2C1491913605286..meta.1491922613008.meta and target:s3a://bucket-backup/hbase/.distcp.tmp.attempt_local1903592397_0001_m_000000_0
It then quickly fails with:
17/04/11 15:50:54 INFO mapreduce.Job: Job job_local1903592397_0001 failed with state FAILED due to: NA 17/04/11 15:50:54 INFO mapreduce.Job: Counters: 28 File System Counters FILE: Number of bytes read=723868 FILE: Number of bytes written=764685 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2169097700 HDFS: Number of bytes written=0 HDFS: Number of read operations=469 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 S3A: Number of bytes read=0 S3A: Number of bytes written=2169097700 S3A: Number of read operations=471 S3A: Number of large read operations=0 S3A: Number of write operations=97 Map-Reduce Framework Map input records=40 Map output records=0 Input split bytes=156 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=1376 Total committed heap usage (bytes)=521142272 File Input Format Counters Bytes Read=13228 File Output Format Counters Bytes Written=8 org.apache.hadoop.tools.mapred.CopyMapper$Counter BYTESCOPIED=2169020252 BYTESEXPECTED=2169020252 at org.apache.hadoop.tools.DistCp.waitForJobCompletion(DistCp.java:205) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:156) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Any ideas? We have HBase running on top of this HDFS setup which is performing writes. Is that a problem for distcp?
From that link I see that having open files could be an issue, does this mean I can't backup with distcp (since I'm running Hbase on top and that can never be stopped)? I can't run a copytable to the local filesystem since the data is just too large for that. Are there any other sensible alternatives for backing up to S3?