Created 07-22-2021 02:22 AM
Is use of Flume in 2021 still the way to go?
I tried searching for some insights on this topic and did not find much.
Looking at the latest release it is in 2019, little or no activity in github, lots of unmerged pull requests, too.
I'm planning on coding a custom source for Flume. My data is binary (not text), unstructured, arriving from a legacy control system. I do not plan (ATM) on having any filtering or processing applied to the data stream. The data stream will be then routed through memory channel to HDFS. FWIW, the environment I will be working in is closed and I can not use cloud based solution. Also the data rates will be to big to pipe them to through the internet.
Looking for alternatives, there is lots of suggestions that sometimes do not feel like a Flume alternative, but then again, I'm new to this "big data" ecosystem and I might be mistaken. For example, Apache NiFi, Spark and alike.
It might be that I could go with Kafka and skip the Flume all together, I guess, but I need to educate myself more about the options.
Thank you for the input!
Created 07-22-2021 07:22 AM
There may be some corner cases where you would want to use something else, but fortunately the general answer to your question is very straightforward:
In general anything you were considering Flume for, you now want to use NiFi for instead.
Flume has been deprecated, so I would not recommend you to spend time and energy into developing custom content for it, rather see if NiFi solves your problem out of the box (or if needed perhaps contribute a processor to NiFi)
Created 07-22-2021 07:22 AM
There may be some corner cases where you would want to use something else, but fortunately the general answer to your question is very straightforward:
In general anything you were considering Flume for, you now want to use NiFi for instead.
Flume has been deprecated, so I would not recommend you to spend time and energy into developing custom content for it, rather see if NiFi solves your problem out of the box (or if needed perhaps contribute a processor to NiFi)
Created 07-22-2021 10:40 AM
Thank you for the insights!
Would you consider a custom Kafka producer and HDFS Kafka consumer to be adequate replacement for Flume/NiFi? At a glance it looks like I could void the whole separate ingest stage and just go with Kafka.
Created 08-03-2021 02:05 PM
The general successor to Flume is NiFi. But if your usage is simple enough, then Kafka connect may also suffice.
Cloudera of course supports both.