01-22-2018 06:34 PM - last edited on 01-23-2018 07:34 AM by cjervis
In nowadays I encountered some questions about how to transfer data in safe via Flume.
My scene is a system output some json data via kafka, and I need to store them to hdfs with extracting some data from json data as storage path on hdfs.
My current plan is using a kafka source - memory channel - hdfs sink structure on flume, since json interceptors can only be used on source module, and memory channel is good at performance. However I doubt about the data lose problem on memory channel. Kafka channel maybe a good choice, but my source is kafka cluster, using a kafka channel seems too redundant.
So can I set kafka source to commit the data offset after the data has been sinking to hdfs to avoid potential data lose in memory channel to guarantee the data delivery? If not, any other suggestion for my situation?