We have a reliable Flume stream from JMS-source through a FileChannel to a HDFS-sink. The FileChannel buffers data before writing to HDFS. One of these blocks (log-<number>) is not valid due a hard-reset of the machine. When flume starts it tries to work through data which is still in the fileChannel and not yet delivered to the sink. The logs said that the file is corrupt.
The content of the data file is binary, but you can see the headers, key (uuid), and the values (xml) in plain text. The bottom of the file looks incomplete, no ending tags. A unix move command fails with an I/O error. After copying the file, the new copied file is a little bit smaller in bytes. Flume also writes a log-<number>.meta next to it. I'm not sure how to make the two files in sync to flume can process it.
I want to restore most of the data of this ~1Gb data file. What is the best approach?