Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Why consumerKafka_0_10 Processor receives flowfile less than total flowfile?

avatar
Contributor

I have 1 producers (PublishKafka_0_10 processor) and 1 consumer (ConsumerKafka_0_10 processor) to receive flowfile from Kafka cluster.

I see on Nifi UI admin, the total out of producers is 7 packages but the consumer just receives only 4 packages. I also use kafka_console_consumer.sh to view the packages from producer and it displays whole 7 packages.

I don't know why and where I lost 3 packages from consumerKafka_0_10 processor.

I use kafka cluster with 3 nodes and nifi cluster with 3 nodes too.

1 ACCEPTED SOLUTION

avatar

@Kiem Nguyen

On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.

View solution in original post

5 REPLIES 5

avatar

@Kiem Nguyen

How many partitions are on the Kafka topic?

avatar
Contributor

@Wynnwe

I have 3 topics in Kafka cluster: aa, bb, cc. I use this syntax to check number of partitions. It seems have 4 partition.

./bin/kafka-topics.sh --describe --zookeeper 10.42.53.16:2181,10.42.53.17:2181,10.42.53.18:2181 --topic aa

Result:

Topic:aa        PartitionCount:4        ReplicationFactor:1     Configs:
        Topic: aa       Partition: 0    Leader: 17      Replicas: 17    Isr: 17
        Topic: aa       Partition: 1    Leader: 18      Replicas: 18    Isr: 18
        Topic: aa       Partition: 2    Leader: 17      Replicas: 17    Isr: 17
        Topic: aa       Partition: 3    Leader: 18      Replicas: 18    Isr: 18

It is same result when I check with topic bb and cc.

However, the loss of the package just occurs in ConsumerKafka_0_10 processor that receives data from topic aa or bb.

The ConsumerKafka_0_10 processor that receives data from topic cc is always enough.

avatar

@Kiem Nguyen

On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.

avatar
Contributor

@Wynner

Yes. I think problem is about configure number of concurrent task on Consumer_Kafka_0_10 processors.

I will accept your answer as you determine right problem. But Can you help me configure number of concurrent task with my case:

In my case, I have 3 PublishKafka_0_10 processors A, B, C. A push data to topic aa, B to topic bb and C to topic cc.

Each PublishKafka_0_10 processor has number of concurrent task is default 1.

As you see, my Kafka cluster has 4 partitions for all 3 topics.

Then I have 3 Consumer_Kafka_0_10 processors D, E, F. D receives data from topic aa, E receives data from topic bb, F receives data from topic cc.

How many concurrent task need to configure for each Consumer_Kafka_0_10 processors D, E, F?

Please help me understand relation between partition and concurrent task.

Thank you so much!

avatar
@Kiem Nguyen

The configuration is to have the same number of concurrent tasks and partitions. So, with 4 partitions on the topics, you want 4 concurrent tasks. Since you have a 3 node cluster, configure your PublishKafka and Consume_Kafka processors with 2 concurrent tasks and you should be good.

For an ideal situation, it would be better if they matched exactly. So, if possible, I would configure the Kafka topics with 6 partitions, or some multiple of three.