Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

increase the speed of splittext processor in nifi

increase the speed of splittext processor in nifi

Explorer

I am using NiFi version 1.8.0 on CentOS7 and I am wantig to transfer 2,000,000 flow file to kafka with NiFi.

core: 2

Memory: 9

I am using 4 splittext processors that set LINE SPLITE COUNT 10000,1000,100,1

How can I increase speed of transfering flow file

94509-split.png

5 REPLIES 5
Highlighted

Re: increase the speed of splittext processor in nifi

Explorer

@Matt Clarke @Matt Burgess

How can I solve it?

Could you help me,please?

Highlighted

Re: increase the speed of splittext processor in nifi

Master Guru
@mojgan ghasemi

It is hard to tell from your screenshot is your splitText processors are really the source of your slowness. The "Red" highlight on several connections indicates that backpressure is being applied by that connection. When backpressure is being applied to the upstream processor of the highlighted connection that processor will not get scheduled to run. Only once backpressure on the outbound connection drops below threshold will backpressure be removed.
-

So you need to find out which processor is furthest down the dataflow path that has an inbound connection that is "Red" and no outbound connections that are "Red". That is the processor you want to focus on.

-

Thank you,

Matt

-

If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.

Highlighted

Re: increase the speed of splittext processor in nifi

Explorer

@Matt Clarke

thanks for your answering.

in the previous test, I used putKafka processor but in the new test I am using publishKafka and I changed it's configuration. These changes make my nifi become faster. because of these configuration, NiFi is transferring data about 3800 event per second.

My target is to transfer about 1.5 millions event per second because the data generation speed is about 1.5 millions event per second too.

cpu : 4

memory :8G

nifi.provenance.repository.index.threads=4

94526-split2.png


publishkafka.png
Highlighted

Re: increase the speed of splittext processor in nifi

Master Guru

@mojgan ghasemi

Your dataflow image here shows that PublishKafka is what is causing the backpressure here.

-

Questions:

1. What version of Kafka are you publishing to? There are multiple versions of the PublishKafka Processors available. For best performance you want to use the PublishKafka processor that uses same client version as your target. the "PublishKafka" processor with no version number use Kafka 0.8 client so it is pretty old. There are now versions for Kafka 0.10, 0.11, 1.0, and 2.0.

2. Are you a NiFi cluster or just a single NiFi instance? You want to match up your number of concurrent tasks (across cluster if clustered) to the same number of partitions you have on your Kafka topic. Each concurrent task is a unique thread/producer that will be associated to one partition from the topic. Having to many or to few concurrent tasks will trigger re-balance which will also affect performance.

3. You most likely will want to look in to using the "PublishKafkaRecord" processor instead. This would remove the need to do most of your splitting. Just spilt received file to enough files to maximize use of each partition on your topic (so enough files so that each concurrent task gets at least one file).

-

Thank you,

Matt

Highlighted

Re: increase the speed of splittext processor in nifi

Explorer

@Matt Clarke

I am using kafka version 1.0 . I used publishKafka_1_0 before but the resualt is not good ( same as publishKafka).

I am using in a single NiFi (install it on CentOS7)

I will use PublishKafkaRecord.

Thank you so much.

Don't have an account?
Coming from Hortonworks? Activate your account here