Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume file channel - JBOD or RAID?

Flume file channel - JBOD or RAID?

What is the recommendation for disk configuration for the file channel with flume?

I read that even if just one of disks involved in a disk channel which flume is writing to fails the channel may not be able to recover any of the data—even events that are on disks that have not failed.

So for this reason, perhaps RAID-ed disks may be preferred?

3 REPLIES 3

Re: Flume file channel - JBOD or RAID?

@awhitter

Nicely explained in this blog https://blogs.apache.org/flume/entry/apache_flume_filechannel

It's important to note that FileChannel does not do any replication of data itself. As such, it is only as reliable as the underlying disks. Users who use FileChannel because of its durability should take this into account when purchasing and configuring hardware. The underlying disks should be RAID, SAN, or similar.

Raid 10 is good choice from performance side.

Re: Flume file channel - JBOD or RAID?

Mentor

@awhitter has this been resolved? Can you post your solution or accept best answer?

Re: Flume file channel - JBOD or RAID?

Contributor

We learned it the hard way. One of the disks crashed which contained the file-channel datadir which resulted in data-loss.

  • Make sure your storage is redundant!
  • Tune your batch sizes
  • Monitor your disks, SMART can tell you that a disk is going to fail
  • Or (when HDP stacks includes Flume 1.6) use a KafkaChannel. (not the KafkaSink)

The message is accepted from the source when its put on/accepted on kafka topic.

Don't have an account?
Coming from Hortonworks? Activate your account here