I am using NiFi version 1.8.0 on CentOS7 and I am wantig to transfer 2,000,000 flow file to kafka with NiFi.
I am using 4 splittext processors that set LINE SPLITE COUNT 10000,1000,100,1
How can I increase speed of transfering flow file
It is hard to tell from your screenshot is your splitText processors are really the source of your slowness. The "Red" highlight on several connections indicates that backpressure is being applied by that connection. When backpressure is being applied to the upstream processor of the highlighted connection that processor will not get scheduled to run. Only once backpressure on the outbound connection drops below threshold will backpressure be removed.
So you need to find out which processor is furthest down the dataflow path that has an inbound connection that is "Red" and no outbound connections that are "Red". That is the processor you want to focus on.
If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
thanks for your answering.
in the previous test, I used putKafka processor but in the new test I am using publishKafka and I changed it's configuration. These changes make my nifi become faster. because of these configuration, NiFi is transferring data about 3800 event per second.
My target is to transfer about 1.5 millions event per second because the data generation speed is about 1.5 millions event per second too.
cpu : 4
Your dataflow image here shows that PublishKafka is what is causing the backpressure here.
1. What version of Kafka are you publishing to? There are multiple versions of the PublishKafka Processors available. For best performance you want to use the PublishKafka processor that uses same client version as your target. the "PublishKafka" processor with no version number use Kafka 0.8 client so it is pretty old. There are now versions for Kafka 0.10, 0.11, 1.0, and 2.0.
2. Are you a NiFi cluster or just a single NiFi instance? You want to match up your number of concurrent tasks (across cluster if clustered) to the same number of partitions you have on your Kafka topic. Each concurrent task is a unique thread/producer that will be associated to one partition from the topic. Having to many or to few concurrent tasks will trigger re-balance which will also affect performance.
3. You most likely will want to look in to using the "PublishKafkaRecord" processor instead. This would remove the need to do most of your splitting. Just spilt received file to enough files to maximize use of each partition on your topic (so enough files so that each concurrent task gets at least one file).
I am using kafka version 1.0 . I used publishKafka_1_0 before but the resualt is not good ( same as publishKafka).
I am using in a single NiFi (install it on CentOS7)
I will use PublishKafkaRecord.
Thank you so much.