Support Questions
Find answers, ask questions, and share your expertise

what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

Solved Go to solution

what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

Hi,

I want to know how Flume is very much useful in streaming log files in real-time. I have practiced to import files through 'exec' command but I want to know what are the different sources used in Flume streaming in real-time projects.

Please help me out in clearing this doubt.

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

Super Collaborator

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

View solution in original post

3 REPLIES 3
Highlighted

Re: what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

Super Collaborator

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

View solution in original post

Highlighted

Re: what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

@Jordan Moore Thanks for the suggestion.

Can you please let me know how log from different sever collected in real-time projects ? If you know any link, you can share.

Highlighted

Re: what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?

Super Collaborator

@Rakesh AN

I have not used Flume in a distributed fashion, but whatever agent you choose, it tails the logs from the agent on that server, then ships them to the configured sink destinations. One agent per server makes it collect from different servers. Flume is near real-time, since it is configured with a batch size.

It's not clear what doubt you have... Can you please explain how you've configured your Flume agents, and the issues you are experiencing?

The Flume documentation is fairly straightforward