- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
what are the different sources used in real-time to import log files through Apache Flume ? how is the real-time data injection on a daily basis ?
- Labels:
-
Apache Flume
-
Apache Hadoop
Created 12-12-2017 10:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want to know how Flume is very much useful in streaming log files in real-time. I have practiced to import files through 'exec' command but I want to know what are the different sources used in Flume streaming in real-time projects.
Please help me out in clearing this doubt.
Thanks,
Created 12-12-2017 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should read the warning on the ExecSource docs against using tail -f
https://flume.apache.org/FlumeUserGuide.html#exec-source
It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."
Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.
Created 12-12-2017 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should read the warning on the ExecSource docs against using tail -f
https://flume.apache.org/FlumeUserGuide.html#exec-source
It even provides you the other sources to consider using instead. Those being "Spooling Directory Source, Taildir Source or direct integration with Flume via the SDK."
Personally, I like tools such as Filebeat or Fluentd for real time collection of logs, and sending those to either Elasticsearch or Solr, since they provide better tooling around log inspection.
Created 12-14-2017 12:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jordan Moore Thanks for the suggestion.
Can you please let me know how log from different sever collected in real-time projects ? If you know any link, you can share.
Created 12-14-2017 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not used Flume in a distributed fashion, but whatever agent you choose, it tails the logs from the agent on that server, then ships them to the configured sink destinations. One agent per server makes it collect from different servers. Flume is near real-time, since it is configured with a batch size.
It's not clear what doubt you have... Can you please explain how you've configured your Flume agents, and the issues you are experiencing?
The Flume documentation is fairly straightforward