Created 03-06-2017 05:37 PM
We have a setup of GetKafka processor to an endpoint of PutHDFS.
I am still working on it to get it to function correctly. I notice when I run it, it will pull data from when it was last run instead of pulling the immediate data from the moment I started it (in the topic). Is this "as designed"? Its possible this is and I'm just unaware ... I expected when I ran it to open the HDFS file and see data that is being pulled from the current time. Is there a way to set it so it only grabs data in the topic that is from the present moment and not all the previous data?
Any insight into this would be helpful.
Created 03-06-2017 06:22 PM
What version of NiFi and what version of Kafka?
NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.
I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.
If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.
Created 03-06-2017 06:22 PM
What version of NiFi and what version of Kafka?
NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.
I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.
If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.
Created 03-06-2017 06:25 PM
This post describes the behavior well:
https://stackoverflow.com/questions/32390265/what-determines-kafka-consumer-offset