Support Questions

rakesh_an1992 · ‎12-12-2017

Hi,

I want to know how Flume is very much useful in streaming log files in real-time. I have practiced to import files through 'exec' command but I want to know what are the different sources used in Flume streaming in real-time projects.

Please help me out in clearing this doubt.

Thanks,

JordanMoore · ‎12-12-2017

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

View solution in original post

JordanMoore · ‎12-12-2017

You should read the warning on the ExecSource docs against using tail -f

https://flume.apache.org/FlumeUserGuide.html#exec-source

It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."

Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.

rakesh_an1992 · ‎12-14-2017

@Jordan Moore Thanks for the suggestion.

Can you please let me know how log from different sever collected in real-time projects ? If you know any link, you can share.

JordanMoore · ‎12-14-2017

@Rakesh AN

I have not used Flume in a distributed fashion, but whatever agent you choose, it tails the logs from the agent on that server, then ships them to the configured sink destinations. One agent per server makes it collect from different servers. Flume is near real-time, since it is configured with a batch size.

It's not clear what doubt you have... Can you please explain how you've configured your Flume agents, and the issues you are experiencing?

The Flume documentation is fairly straightforward

Cloudera Community

Support Questions

what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?