Support Questions

Find answers, ask questions, and share your expertise

Multiple Flume Agents to fetch data from MQ messaging broker

avatar

How would I go about configuring multiple flume agents to fetch data from an MQ messaging broker? So that they don't duplicate messages back to their sink.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Well, based on what we know so far, i'd say 2 flume agents with the file or jdbc channel should work for you.

There will be no overlap in data because is controlled by MQ itself, so it not a matter of flume.

From flume processing side we ensure that no data loss happens by using file or jdbc channel.

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Can you explain a bit the issue with MQ? Im not an expert in WebSphere, but seems MQ is supposed to deliver each event only once. So, there should be no duplicates by design. Is it correct?

avatar

Hi @Michael M - good question. I think my understanding of the MQ Queue was incorrect - where I thought if data is read, the data still exists on the queue when in fact that data is gone. The flume agent is set to use the memory channel and not the file channel, so if the agent crashes, what has been ingested from the source is lost. This may be the wrong approach because if the agent reads off the queue, that data off of the queue is no longer available for consumption. So if the agent crashes (and is using Memory Channel), that data is lost right? Multiple flume agents reading from the same queue won't step on each other because of this, right?

avatar
Master Guru

Basically JMS standard never delivers an acknowledged message twice. So yes each message goes to one flume agent. There is no replication in it and you need to make sure that agent doesn't have outages ( raided discs, file channel, ... )

MQ systems provides different ways to provide reliability for example Publish subscribe. But I don't think Flume supports that.

http://www.ibm.com/support/knowledgecenter/#!/SSFKSJ_7.0.1/com.ibm.mq.amqnar.doc/ps20010_.htm

There is also the possibility to duplicate each message to two topics. however in this case you need to do a deduplication somewhere in your ingest logic. ( Flume would not work here you would need to do that downstream when processing the messages )

avatar
Super Collaborator

Well, based on what we know so far, i'd say 2 flume agents with the file or jdbc channel should work for you.

There will be no overlap in data because is controlled by MQ itself, so it not a matter of flume.

From flume processing side we ensure that no data loss happens by using file or jdbc channel.

avatar
New Contributor

Ryan, can i ask you how do you set up flume to fectch messages from MQ?