Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi Cluster reading duplicate data from kafka

Nifi Cluster reading duplicate data from kafka

Explorer

Hi,

We have a cluster running with 6 nodes. Now when I add a Kafka consumer, each cluster node should pull unique data, as in each node should fetch from a diff partition: https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka.

The same is also mentioned in the nifi docs. However in our case each node is pulling the same data from Kafka leading to duplication. Can you please help. Are there any specific configurations required to get the same done?

5 REPLIES 5
Highlighted

Re: Nifi Cluster reading duplicate data from kafka

Contributor

@siddharth pande

The default behavior is the one you described at the beginning, with each node consuming from a different partition.

You should share the Processor's configuration and a describe of the topic.

Check also that the ConsumeKafka processor is compatible with the version of Kafka you are using.

Highlighted

Re: Nifi Cluster reading duplicate data from kafka

Explorer

@Rafeeq Shanavaz: My nifi version is 1.7.1 and consume kafka version is 0.10.1.7.1, not sure whether they are compatible or not. Can this be an issue?

Highlighted

Re: Nifi Cluster reading duplicate data from kafka

Explorer

Version is not an issue, got the same issue when using 1.7.1 consume kafka and 1.7.1 nifi

Highlighted

Re: Nifi Cluster reading duplicate data from kafka

Contributor

Which version of Kafka are you using? (not ConsumeKafka)
Can you also post the configuration?

Highlighted

Re: Nifi Cluster reading duplicate data from kafka

Cloudera Employee

Hi @siddharth pande -

Can you also expand on the type of duplication -- do 4 concurrent tasks equate to duplication of 4x of the originating data on Kafka? Alternatively, are you getting one partition duplicated while others not? Does the issue happen all of the time? Are we sure the duplication isn't happening before it gets into Kafka (say, a producer sending duplicate data?)

Per Raffaele's suggestion, please send over the configuration of the Kafka Processor within Nifi. Also, if you have fake data (or a schema that we could follow that you can easily reproduce), you can share that you can get this to duplicate by publishing to a topic, I'd like to try to reproduce it.

Thanks,


Jeff

Don't have an account?
Coming from Hortonworks? Activate your account here