- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Flume HA
- Labels:
-
Apache Flume
Created on 03-29-2016 02:54 AM - edited 09-16-2022 03:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Is there a way to deploy Flume in HA?
Thank you,
Created 04-11-2016 11:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So am I understanding correctly?
Data generating server -> HTTP Post -> Flume HTTP Source -> Flume sink -> etc.
and you want to make two Flume HTTP Source machines that can be written to and be able to recieve that data, in case one of them went down? You also don't want to have to manage something like a load balancer/proxy in between the Data generating server and the Flume HTTP Source box?
If you can handle de-duplication on your back-end, then I think you could do this by sending the same data to two different Flume HTTP Source servers at the same time, possibly tagging the data in your sink to help you de-duplicate later.
Data generating server -> HTTP Post -> Flume HTTP Source - Flume sink with tag --> to de-duplication
|
--------------------> HTTP Post -> Flume HTTP Source - Flume sink with tag --> to de-duplication
Created 03-31-2016 09:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-PD
Created 04-02-2016 03:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your reply,
In my case the flume source is HTTP, and I wanted to know if there is a way to ensure that if the machine with the flume source if getting down, I can still receive the data (HA).
However, I can imagine only a solution with two sources and a load balancer machine before the 2 machines....and I was searching more for a solution within the Hadoop cluster (as it is done with YARN and HBase..)
Thank you,
Alina
Created 04-11-2016 11:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So am I understanding correctly?
Data generating server -> HTTP Post -> Flume HTTP Source -> Flume sink -> etc.
and you want to make two Flume HTTP Source machines that can be written to and be able to recieve that data, in case one of them went down? You also don't want to have to manage something like a load balancer/proxy in between the Data generating server and the Flume HTTP Source box?
If you can handle de-duplication on your back-end, then I think you could do this by sending the same data to two different Flume HTTP Source servers at the same time, possibly tagging the data in your sink to help you de-duplicate later.
Data generating server -> HTTP Post -> Flume HTTP Source - Flume sink with tag --> to de-duplication
|
--------------------> HTTP Post -> Flume HTTP Source - Flume sink with tag --> to de-duplication
Created 04-12-2016 08:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm note sure that I can change all the sources in order to post to all my Flume agents, but this is an interesting solution.
Thank you!