Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Tips to increase performance in flafka

avatar
Contributor

Hello, I have flafka runing in my cluster, but the performance is too slow (like 1500 logs/s) and I want to reach like 10,000 logs/s. I have 1 agent, 1 source, 1 sink and 1 broker. The flume-ng configuration is:

 

# Sources, channels, and sinks are defined per
# agent name, in this case flume1.
flume1.sources  = kafka-source-1
flume1.channels = hdfs-channel-1  
flume1.sinks    = hdfs-sink-1 
 
# For each source, channel, and sink, set
# standard properties.
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = 192.168.70.24:2181
flume1.sources.kafka-source-1.topic = kafkatopic2
flume1.sources.kafka-source-1.batchSize = 2000
flume1.sources.kafka-source-1.channels = hdfs-channel-1

flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
#flume1.sinks.logSink.channel = logChannel
 
flume1.channels.hdfs-channel-1.type   = memory
#flume1.channels.logChannel.type   = memory

flume1.channels.hdfs-channel-1.capacity = 100000
flume1.channels.hdfs-channel-1.transactionCapacity = 20000


#flume1.channels.logChannel.capacity = 1000000
#flume1.channels.logChannel.transactionCapacity = 100000

#Interceptors setup
flume1.sources.kafka-source-1.interceptors = i1
flume1.sources.kafka-source-1.interceptors.i1.type = com.mycompany.app.App$Builder


# sinks configuration
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = %{product}
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /user/root/logs/%{product}/%{client}/
flume1.sinks.hdfs-sink-1.hdfs.rollCount=4000
flume1.sinks.hdfs-sink-1.hdfs.rollSize=0
flume1.sinks.hdfs-sink-1.hdfs.callTimeout = 1500000

#flume1.sinks.logSink.type = logger

Kafka configuration is:

number of partitions: 1
replication factor: 1
number of replicas in IRS: 1
number of partitions: 50
topic replication factor: 1

Thanks,

Who agreed with this topic