Created 06-09-2017 06:25 AM
I have 1 producers (PublishKafka_0_10 processor) and 1 consumer (ConsumerKafka_0_10 processor) to receive flowfile from Kafka cluster.
I see on Nifi UI admin, the total out of producers is 7 packages but the consumer just receives only 4 packages. I also use kafka_console_consumer.sh to view the packages from producer and it displays whole 7 packages.
I don't know why and where I lost 3 packages from consumerKafka_0_10 processor.
I use kafka cluster with 3 nodes and nifi cluster with 3 nodes too.
Created 06-13-2017 12:53 PM
On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.
Created 06-09-2017 12:35 PM
How many partitions are on the Kafka topic?
Created 06-13-2017 07:09 AM
@Wynnwe
I have 3 topics in Kafka cluster: aa, bb, cc. I use this syntax to check number of partitions. It seems have 4 partition.
./bin/kafka-topics.sh --describe --zookeeper 10.42.53.16:2181,10.42.53.17:2181,10.42.53.18:2181 --topic aa
Result:
Topic:aa PartitionCount:4 ReplicationFactor:1 Configs: Topic: aa Partition: 0 Leader: 17 Replicas: 17 Isr: 17 Topic: aa Partition: 1 Leader: 18 Replicas: 18 Isr: 18 Topic: aa Partition: 2 Leader: 17 Replicas: 17 Isr: 17 Topic: aa Partition: 3 Leader: 18 Replicas: 18 Isr: 18
It is same result when I check with topic bb and cc.
However, the loss of the package just occurs in ConsumerKafka_0_10 processor that receives data from topic aa or bb.
The ConsumerKafka_0_10 processor that receives data from topic cc is always enough.
Created 06-13-2017 12:53 PM
On the Consumer_Kafka_0_10 processors, configure them with 2 concurrent tasks, and see if that resolves the issue.
Created 06-14-2017 02:49 AM
Yes. I think problem is about configure number of concurrent task on Consumer_Kafka_0_10 processors.
I will accept your answer as you determine right problem. But Can you help me configure number of concurrent task with my case:
In my case, I have 3 PublishKafka_0_10 processors A, B, C. A push data to topic aa, B to topic bb and C to topic cc.
Each PublishKafka_0_10 processor has number of concurrent task is default 1.
As you see, my Kafka cluster has 4 partitions for all 3 topics.
Then I have 3 Consumer_Kafka_0_10 processors D, E, F. D receives data from topic aa, E receives data from topic bb, F receives data from topic cc.
How many concurrent task need to configure for each Consumer_Kafka_0_10 processors D, E, F?
Please help me understand relation between partition and concurrent task.
Thank you so much!
Created 06-14-2017 02:20 PM
The configuration is to have the same number of concurrent tasks and partitions. So, with 4 partitions on the topics, you want 4 concurrent tasks. Since you have a 3 node cluster, configure your PublishKafka and Consume_Kafka processors with 2 concurrent tasks and you should be good.
For an ideal situation, it would be better if they matched exactly. So, if possible, I would configure the Kafka topics with 6 partitions, or some multiple of three.