Support Questions

RyanCicak · ‎04-13-2016

How would I go about configuring multiple flume agents to fetch data from an MQ messaging broker? So that they don't duplicate messages back to their sink.

bluesmix · ‎04-15-2016

Well, based on what we know so far, i'd say 2 flume agents with the file or jdbc channel should work for you.

There will be no overlap in data because is controlled by MQ itself, so it not a matter of flume.

From flume processing side we ensure that no data loss happens by using file or jdbc channel.

View solution in original post

bluesmix · ‎04-14-2016

Can you explain a bit the issue with MQ? Im not an expert in WebSphere, but seems MQ is supposed to deliver each event only once. So, there should be no duplicates by design. Is it correct?

RyanCicak · ‎04-14-2016

Hi @Michael M - good question. I think my understanding of the MQ Queue was incorrect - where I thought if data is read, the data still exists on the queue when in fact that data is gone. The flume agent is set to use the memory channel and not the file channel, so if the agent crashes, what has been ingested from the source is lost. This may be the wrong approach because if the agent reads off the queue, that data off of the queue is no longer available for consumption. So if the agent crashes (and is using Memory Channel), that data is lost right? Multiple flume agents reading from the same queue won't step on each other because of this, right?

bleonhardi · ‎04-15-2016

Basically JMS standard never delivers an acknowledged message twice. So yes each message goes to one flume agent. There is no replication in it and you need to make sure that agent doesn't have outages ( raided discs, file channel, ... )

MQ systems provides different ways to provide reliability for example Publish subscribe. But I don't think Flume supports that.

http://www.ibm.com/support/knowledgecenter/#!/SSFKSJ_7.0.1/com.ibm.mq.amqnar.doc/ps20010_.htm

There is also the possibility to duplicate each message to two topics. however in this case you need to do a deduplication somewhere in your ingest logic. ( Flume would not work here you would need to do that downstream when processing the messages )

bluesmix · ‎04-15-2016

Well, based on what we know so far, i'd say 2 flume agents with the file or jdbc channel should work for you.

There will be no overlap in data because is controlled by MQ itself, so it not a matter of flume.

From flume processing side we ensure that no data loss happens by using file or jdbc channel.

avendano_mauro · ‎04-19-2017

Ryan, can i ask you how do you set up flume to fectch messages from MQ?

Cloudera Community

Support Questions

Multiple Flume Agents to fetch data from MQ messaging broker