Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

flume configuration - multiple topics in a single configuration file - best compact way possible

flume configuration - multiple topics in a single configuration file - best compact way possible

Explorer

Hi, I have multiple kafka topics to process through flume. I want to use only one flume agent and so one configuration file.

So how do I compact it in the best way possible?

Will the below configuration work? especially having a single source/channel/sink to process multiple kafka topics and multiple serializer schemaURLs? the flume data gets deposited in a hdfs directory which has the name of the topic.

loan_application_agent.sources = kafka1

loan_application_agent.channels = channel1

loan_application_agent.sinks = sink1

loan_application_agent.sources.kafka1.type = org.apache.flume.source.kafka.KafkaSource loan_application_agent.sources.kafka1.channels = channel1

loan_application_agent.sources.kafka1.kafka.bootstrap.servers = host1:9092,host2:9092,host3:9092 loan_application_agent.sources.kafka1.kafka.key.deserializer = org.apache.kafka.common.serialization.StringDeserializer loan_application_agent.sources.kafka1.kafka.value.deserializer = org.apache.kafka.common.serialization.ByteArrayDeserializer

loan_application_agent.sources.kafka1.kafka.topics = topic1, topic2 loan_application_agent.sources.kafka1.kafka.consumer.group.id = flume-consumer loan_application_agent.sources.kafka1.batchSize = 500

loan_application_agent.channels.channel1.type = memory

loan_application_agent.channels.channel1.capacity = 10000

loan_application_agent.channels.channel1.transactionCapacity = 1000

loan_application_agent.sinks.sink1.channel = channel1

loan_application_agent.sinks.sink1.type = hdfs

loan_application_agent.sinks.sink1.hdfs.writeFormat = Text

loan_application_agent.sinks.sink1.hdfs.fileType = DataStream

loan_application_agent.sinks.sink1.hdfs.path = /tmp/%{topic}/

loan_application_agent.sinks.sink1.hdfs.filePrefix = %{topic}

loan_application_agent.sinks.sink1.serializer = org.apache.flume.sink.hdfs.AvroEventSerializer$Builder loan_application_agent.sinks.sink1.serializer.schemaURL = hdfs://host7:8020/tmp/avroschemas-new/table1.json, hdfs://host7:8020/tmp/avroschemas-new/table2.json

Appreciate the insights.