Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to recover most data from corrupt file from Flume file-channel dataDir?

Highlighted

How to recover most data from corrupt file from Flume file-channel dataDir?

Contributor

We have a reliable Flume stream from JMS-source through a FileChannel to a HDFS-sink. The FileChannel buffers data before writing to HDFS. One of these blocks (log-<number>) is not valid due a hard-reset of the machine. When flume starts it tries to work through data which is still in the fileChannel and not yet delivered to the sink. The logs said that the file is corrupt.

The content of the data file is binary, but you can see the headers, key (uuid), and the values (xml) in plain text. The bottom of the file looks incomplete, no ending tags. A unix move command fails with an I/O error. After copying the file, the new copied file is a little bit smaller in bytes. Flume also writes a log-<number>.meta next to it. I'm not sure how to make the two files in sync to flume can process it.

I want to restore most of the data of this ~1Gb data file. What is the best approach?

1 REPLY 1
Highlighted

Re: How to recover most data from corrupt file from Flume file-channel dataDir?

Mentor

Did you get past this problem, what was the solution?

Don't have an account?
Coming from Hortonworks? Activate your account here