Support Questions

Jagatheeshr · ‎10-14-2015

Would like know if there is a way to run flume in HA mode.

orenault · ‎10-28-2015

Here is some of the key points to use Flume in "HA"

1. Setup File Channels instead of Memory Channels (using a RAID array is very paranoid but possible) on any Flume agent in use

2. Create a nanny process/script to watch for flume agent failures and restart immediately

3. Put the Flume agent collector/aggregation/2nd tier behind a network load balancer and use a VIP. This also has the benefit for balancing load for high ingest

4. Optionally have a sink that dumps to cycling files (separate from the drive the File Channel operates on) on the local drives in addition to a sink that forwards it on the next flume node or directly to HDFS. At least then you have the time it takes to fill a drive to correct any major issues and recover lost ingest streams.

5. Use the built in JMX counters in Flume to setup alerts in your favorite Operations Center application

View solution in original post

deepesh1 · ‎10-14-2015

I don't think there is HA in Flume. If you are worried about losing events because of Flume Agent going down you can use the File Channel which uses checkpointing. This makes sure that no events are lost while the Flume Agent is down and can begin to send event to sink from where it left off.

In case you are worried about the destination sink your agent is writing to going down then you can use the Failover Sink Processor.

Jagatheeshr · ‎10-14-2015

Thanks @Deepesh. File Channel would solve the Data Loss problem and failover sink processer address the issue with the sink failure rather than the flume failure.

What if the flume agent on a node gets killed and as a result there is no message is passed to the sink. Wouldn't it be a good idea to have another Flume agent registered in a Zookeeper to periodically check if the other flume agent is alive,if ever it dies then this can start piping the data to the sink.

deepesh1 · ‎10-15-2015

Its hard to give a generic answer on how to achieve high availability without knowing the topology the data and form of ingestion and where and how it is written in destination. In many cases if the data at source is available even if the agent gets killed, upon restarting the agent the checkpointing on the file channel will let the agent recover from the point where it failed. Sometimes topology has multiple Flume agents started for availability, ofcourse there will be issue with data redundancy but thats fine in some cases.

orenault · ‎10-28-2015

Here is some of the key points to use Flume in "HA"

1. Setup File Channels instead of Memory Channels (using a RAID array is very paranoid but possible) on any Flume agent in use

2. Create a nanny process/script to watch for flume agent failures and restart immediately

3. Put the Flume agent collector/aggregation/2nd tier behind a network load balancer and use a VIP. This also has the benefit for balancing load for high ingest

4. Optionally have a sink that dumps to cycling files (separate from the drive the File Channel operates on) on the local drives in addition to a sink that forwards it on the next flume node or directly to HDFS. At least then you have the time it takes to fill a drive to correct any major issues and recover lost ingest streams.

5. Use the built in JMX counters in Flume to setup alerts in your favorite Operations Center application

samkt99 · ‎06-20-2016

Using ambari for high availability setup for flume ,is there any complete step by step documentation installation instructions somewhere i can read . Please let me know the link . Thanks once again

Cloudera Community

Support Questions

How to run Flume in HA ?

Flume HA

HiveSink for Flume

While running flume agent facing some error

Apache Flume required to be run in CDP environment

Changing dfs.nameservices value after HDFS HA has ...

Knox HA / Loadbalancing using Haproxy

How QJM Works in Namenode HA

Flume Spooling Directory Source runner has shutdow...

Balancer not working in hdfs HA

Cloudera Manager 7.7.1 and the KNOX issues with HA