Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Apache Flume in 2021

avatar
New Contributor

Is use of Flume in 2021 still the way to go? 

I tried searching for some insights on this topic and did not find much.

 

Looking at the latest release it is in 2019, little or no activity in github, lots of unmerged pull requests, too.

 

I'm planning on coding a custom source for Flume. My data is binary (not text), unstructured, arriving from a legacy control system. I do not plan (ATM) on having any filtering or processing applied to the data stream. The data stream will be then routed through memory channel to HDFS. FWIW, the environment I will be working in is closed and I can not use cloud based solution. Also the data rates will be to big to pipe them to through the internet.

 

Looking for alternatives, there is lots of suggestions that sometimes do not feel like a Flume alternative, but then again, I'm new to this "big data" ecosystem and I might be mistaken. For example, Apache NiFi, Spark and alike.

 

It might be that I could go with Kafka and skip the Flume all together, I guess, but I need to educate myself more about the options.

 

Thank you for the input!

1 ACCEPTED SOLUTION

avatar

There may be some corner cases where you would want to use something else, but fortunately the general answer to your question is very straightforward:

 

In general anything you were considering Flume for, you now want to use NiFi for instead.

 

Flume has been deprecated, so I would not recommend you to spend time and energy into developing custom content for it, rather see if NiFi solves your problem out of the box (or if needed perhaps contribute a processor to NiFi)


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

View solution in original post

3 REPLIES 3

avatar

There may be some corner cases where you would want to use something else, but fortunately the general answer to your question is very straightforward:

 

In general anything you were considering Flume for, you now want to use NiFi for instead.

 

Flume has been deprecated, so I would not recommend you to spend time and energy into developing custom content for it, rather see if NiFi solves your problem out of the box (or if needed perhaps contribute a processor to NiFi)


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

avatar
New Contributor

Thank you for the insights!

 

Would you consider a custom Kafka producer and HDFS Kafka consumer to be adequate replacement for Flume/NiFi? At a glance it looks like I could void the whole separate ingest stage and just go with Kafka.

avatar

The general successor to Flume is NiFi. But if your usage is simple enough, then Kafka connect may also suffice.

 

Cloudera of course supports both.


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.