What is the recommendation for disk configuration for the file channel with flume?
I read that even if just one of disks involved in a disk channel which flume is writing to fails the channel may not be able to recover any of the data—even events that are on disks that have not failed.
So for this reason, perhaps RAID-ed disks may be preferred?
Nicely explained in this blog https://blogs.apache.org/flume/entry/apache_flume_filechannel
It's important to note that FileChannel does not do any replication of data itself. As such, it is only as reliable as the underlying disks. Users who use FileChannel because of its durability should take this into account when purchasing and configuring hardware. The underlying disks should be RAID, SAN, or similar.
Raid 10 is good choice from performance side.
We learned it the hard way. One of the disks crashed which contained the file-channel datadir which resulted in data-loss.
The message is accepted from the source when its put on/accepted on kafka topic.