Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

avatar
Contributor

Hi,

I want to know how Flume is very much useful in streaming log files in real-time. I have practiced to import files through 'exec' command but I want to know what are the different sources used in Flume streaming in real-time projects.

Please help me out in clearing this doubt.

Thanks,

1 ACCEPTED SOLUTION

avatar
Super Collaborator

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

avatar
Contributor

@Jordan Moore Thanks for the suggestion.

Can you please let me know how log from different sever collected in real-time projects ? If you know any link, you can share.

avatar
Super Collaborator

@Rakesh AN

I have not used Flume in a distributed fashion, but whatever agent you choose, it tails the logs from the agent on that server, then ships them to the configured sink destinations. One agent per server makes it collect from different servers. Flume is near real-time, since it is configured with a batch size.

It's not clear what doubt you have... Can you please explain how you've configured your Flume agents, and the issues you are experiencing?

The Flume documentation is fairly straightforward