Reply
Explorer
Posts: 35
Registered: ‎11-24-2015

flume scope

[ Edited ]

Hi, is it possible for flume to process multiple kafka topics with 1 agent, 1 source, 1 channel and 1 sink? 

 

We have like 20 kafka topics to copy into hdfs using flume.

 

Wondering if i should create a flume agent for each topic - or use one agent with 20 sources/20 channels/20 sinks -  or one agent with 1 source/channel/sink - if that is possible.

 

Appreciate the feedback.

Cloudera Employee
Posts: 85
Registered: ‎03-01-2016

Re: flume scope

Since CDH 5.8, flume KafkaSource is able to consumer from multiple topics:

 

https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html?_ga=1.167447485.2081826704.1436...

 

use the kafka.topics property, Comma-separated list of topics the kafka consumer will read messages from.

 

Highlighted
Explorer
Posts: 35
Registered: ‎11-24-2015

Re: flume scope

[ Edited ]

I have a few more questions on the scope of flume :

 

1. Is it possible to have multiple flume agents running on multiple hosts? So if we have 12 topics, can we split them to be processed across three hosts each with a flume agent running?

2. For multiple topics for the same source, so would that mean one channel/sink as well or multiple channels/sinks?
also we create directories in hdfs based on the name of the topic - so for the below parameters is there a runtime variable we can use to specify the topic name? the topic name used below is 'table1'.
flume2.sinks.sink1.hdfs.path = /tmp/table1/
flume2.sinks.sink1.hdfs.filePrefix = table1-

 

Appreciate the insights.

Cloudera Employee
Posts: 255
Registered: ‎01-09-2014

Re: flume scope

You can have multiple flume agents running on multiple hosts. If they are sharing the same flume configured group.id, the messages will be distributed to all the agents (not duplicated).

If you don't need to do any processing on the events (via an interceptor), you could just use a kafka channel and hdfs sink, and that would deliver events directly from the channel. In that case you can only use one topic per channel, but could then have an associated sink delivering just that topic.

If you did want to use a flume kafka source, it adds a 'topic' header that specifies the topic name that the message was consumed from, and you could put that in the hdfs.path or hdfs.filePrefix as %{topic}

-pd
Announcements
New solutions