Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Why consumerKafka_0_10 Processor receives flowfile less than total flowfile?

avatar

I have 1 producers (PublishKafka_0_10 processor) and 1 consumer (ConsumerKafka_0_10 processor) to receive flowfile from Kafka cluster.

I see on Nifi UI admin, the total out of producers is 7 packages but the consumer just receives only 4 packages. I also use kafka_console_consumer.sh to view the packages from producer and it displays whole 7 packages.

I don't know why and where I lost 3 packages from consumerKafka_0_10 processor.

I use kafka cluster with 3 nodes and nifi cluster with 3 nodes too.

1 ACCEPTED SOLUTION

avatar

@Kiem Nguyen

On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.

View solution in original post

5 REPLIES 5

avatar

@Kiem Nguyen

How many partitions are on the Kafka topic?

avatar

@Wynnwe

I have 3 topics in Kafka cluster: aa, bb, cc. I use this syntax to check number of partitions. It seems have 4 partition.

./bin/kafka-topics.sh --describe --zookeeper 10.42.53.16:2181,10.42.53.17:2181,10.42.53.18:2181 --topic aa

Result:

Topic:aa        PartitionCount:4        ReplicationFactor:1     Configs:
        Topic: aa       Partition: 0    Leader: 17      Replicas: 17    Isr: 17
        Topic: aa       Partition: 1    Leader: 18      Replicas: 18    Isr: 18
        Topic: aa       Partition: 2    Leader: 17      Replicas: 17    Isr: 17
        Topic: aa       Partition: 3    Leader: 18      Replicas: 18    Isr: 18

It is same result when I check with topic bb and cc.

However, the loss of the package just occurs in ConsumerKafka_0_10 processor that receives data from topic aa or bb.

The ConsumerKafka_0_10 processor that receives data from topic cc is always enough.

avatar

@Kiem Nguyen

On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.

avatar

@Wynner

Yes. I think problem is about configure number of concurrent task on Consumer_Kafka_0_10 processors.

I will accept your answer as you determine right problem. But Can you help me configure number of concurrent task with my case:

In my case, I have 3 PublishKafka_0_10 processors A, B, C. A push data to topic aa, B to topic bb and C to topic cc.

Each PublishKafka_0_10 processor has number of concurrent task is default 1.

As you see, my Kafka cluster has 4 partitions for all 3 topics.

Then I have 3 Consumer_Kafka_0_10 processors D, E, F. D receives data from topic aa, E receives data from topic bb, F receives data from topic cc.

How many concurrent task need to configure for each Consumer_Kafka_0_10 processors D, E, F?

Please help me understand relation between partition and concurrent task.

Thank you so much!

avatar
@Kiem Nguyen

The configuration is to have the same number of concurrent tasks and partitions. So, with 4 partitions on the topics, you want 4 concurrent tasks. Since you have a 3 node cluster, configure your PublishKafka and Consume_Kafka processors with 2 concurrent tasks and you should be good.

For an ideal situation, it would be better if they matched exactly. So, if possible, I would configure the Kafka topics with 6 partitions, or some multiple of three.