Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Understanding how Nifi retrieves from Kakfa

avatar
Expert Contributor

We have a setup of GetKafka processor to an endpoint of PutHDFS.

I am still working on it to get it to function correctly. I notice when I run it, it will pull data from when it was last run instead of pulling the immediate data from the moment I started it (in the topic). Is this "as designed"? Its possible this is and I'm just unaware ... I expected when I ran it to open the HDFS file and see data that is being pulled from the current time. Is there a way to set it so it only grabs data in the topic that is from the present moment and not all the previous data?

Any insight into this would be helpful.

1 ACCEPTED SOLUTION

avatar
Master Guru

What version of NiFi and what version of Kafka?

NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.

I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.

If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.

View solution in original post

2 REPLIES 2

avatar
Master Guru

What version of NiFi and what version of Kafka?

NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.

I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.

If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.

avatar
Master Guru