Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi Cluster reading duplicate data from kafka

avatar
Contributor

Hi,

We have a cluster running with 6 nodes. Now when I add a Kafka consumer, each cluster node should pull unique data, as in each node should fetch from a diff partition: https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka.

The same is also mentioned in the nifi docs. However in our case each node is pulling the same data from Kafka leading to duplication. Can you please help. Are there any specific configurations required to get the same done?

5 REPLIES 5

avatar
Contributor

@siddharth pande

The default behavior is the one you described at the beginning, with each node consuming from a different partition.

You should share the Processor's configuration and a describe of the topic.

Check also that the ConsumeKafka processor is compatible with the version of Kafka you are using.

avatar
Contributor

@Rafeeq Shanavaz: My nifi version is 1.7.1 and consume kafka version is 0.10.1.7.1, not sure whether they are compatible or not. Can this be an issue?

avatar
Contributor

Version is not an issue, got the same issue when using 1.7.1 consume kafka and 1.7.1 nifi

avatar
Contributor

Which version of Kafka are you using? (not ConsumeKafka)
Can you also post the configuration?

avatar
Cloudera Employee

Hi @siddharth pande -

Can you also expand on the type of duplication -- do 4 concurrent tasks equate to duplication of 4x of the originating data on Kafka? Alternatively, are you getting one partition duplicated while others not? Does the issue happen all of the time? Are we sure the duplication isn't happening before it gets into Kafka (say, a producer sending duplicate data?)

Per Raffaele's suggestion, please send over the configuration of the Kafka Processor within Nifi. Also, if you have fake data (or a schema that we could follow that you can easily reproduce), you can share that you can get this to duplicate by publishing to a topic, I'd like to try to reproduce it.

Thanks,


Jeff