Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who Agreed with this topic

Tips to increase performance in flafka

Contributor

Hello, I have flafka runing in my cluster, but the performance is too slow (like 1500 logs/s) and I want to reach like 10,000 logs/s. I have 1 agent, 1 source, 1 sink and 1 broker. The flume-ng configuration is:

 

# Sources, channels, and sinks are defined per
# agent name, in this case flume1.
flume1.sources  = kafka-source-1
flume1.channels = hdfs-channel-1  
flume1.sinks    = hdfs-sink-1 
 
# For each source, channel, and sink, set
# standard properties.
flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource
flume1.sources.kafka-source-1.zookeeperConnect = 192.168.70.24:2181
flume1.sources.kafka-source-1.topic = kafkatopic2
flume1.sources.kafka-source-1.batchSize = 2000
flume1.sources.kafka-source-1.channels = hdfs-channel-1

flume1.sinks.hdfs-sink-1.channel = hdfs-channel-1
#flume1.sinks.logSink.channel = logChannel
 
flume1.channels.hdfs-channel-1.type   = memory
#flume1.channels.logChannel.type   = memory

flume1.channels.hdfs-channel-1.capacity = 100000
flume1.channels.hdfs-channel-1.transactionCapacity = 20000


#flume1.channels.logChannel.capacity = 1000000
#flume1.channels.logChannel.transactionCapacity = 100000

#Interceptors setup
flume1.sources.kafka-source-1.interceptors = i1
flume1.sources.kafka-source-1.interceptors.i1.type = com.mycompany.app.App$Builder


# sinks configuration
flume1.sinks.hdfs-sink-1.type = hdfs
flume1.sinks.hdfs-sink-1.hdfs.writeFormat = Text
flume1.sinks.hdfs-sink-1.hdfs.fileType = DataStream
flume1.sinks.hdfs-sink-1.hdfs.filePrefix = %{product}
flume1.sinks.hdfs-sink-1.hdfs.useLocalTimeStamp = true
flume1.sinks.hdfs-sink-1.hdfs.path = /user/root/logs/%{product}/%{client}/
flume1.sinks.hdfs-sink-1.hdfs.rollCount=4000
flume1.sinks.hdfs-sink-1.hdfs.rollSize=0
flume1.sinks.hdfs-sink-1.hdfs.callTimeout = 1500000

#flume1.sinks.logSink.type = logger

Kafka configuration is:

number of partitions: 1
replication factor: 1
number of replicas in IRS: 1
number of partitions: 50
topic replication factor: 1

Thanks,

Who Agreed with this topic