Created 11-02-2018 09:08 AM
Hi,
We have a cluster running with 6 nodes. Now when I add a Kafka consumer, each cluster node should pull unique data, as in each node should fetch from a diff partition: https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka.
The same is also mentioned in the nifi docs. However in our case each node is pulling the same data from Kafka leading to duplication. Can you please help. Are there any specific configurations required to get the same done?
Created 11-02-2018 09:12 AM
The default behavior is the one you described at the beginning, with each node consuming from a different partition.
You should share the Processor's configuration and a describe of the topic.
Check also that the ConsumeKafka processor is compatible with the version of Kafka you are using.
Created 11-02-2018 09:34 AM
@Rafeeq Shanavaz: My nifi version is 1.7.1 and consume kafka version is 0.10.1.7.1, not sure whether they are compatible or not. Can this be an issue?
Created 11-02-2018 10:20 AM
Version is not an issue, got the same issue when using 1.7.1 consume kafka and 1.7.1 nifi
Created 11-06-2018 08:25 AM
Which version of Kafka are you using? (not ConsumeKafka)
Can you also post the configuration?
Created 11-02-2018 01:52 PM
Hi @siddharth pande -
Can you also expand on the type of duplication -- do 4 concurrent tasks equate to duplication of 4x of the originating data on Kafka? Alternatively, are you getting one partition duplicated while others not? Does the issue happen all of the time? Are we sure the duplication isn't happening before it gets into Kafka (say, a producer sending duplicate data?)
Per Raffaele's suggestion, please send over the configuration of the Kafka Processor within Nifi. Also, if you have fake data (or a schema that we could follow that you can easily reproduce), you can share that you can get this to duplicate by publishing to a topic, I'd like to try to reproduce it.
Thanks,
Jeff