Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Understanding how Nifi retrieves from Kakfa

Solved Go to solution

Understanding how Nifi retrieves from Kakfa

Rising Star

We have a setup of GetKafka processor to an endpoint of PutHDFS.

I am still working on it to get it to function correctly. I notice when I run it, it will pull data from when it was last run instead of pulling the immediate data from the moment I started it (in the topic). Is this "as designed"? Its possible this is and I'm just unaware ... I expected when I ran it to open the HDFS file and see data that is being pulled from the current time. Is there a way to set it so it only grabs data in the topic that is from the present moment and not all the previous data?

Any insight into this would be helpful.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Understanding how Nifi retrieves from Kakfa

What version of NiFi and what version of Kafka?

NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.

I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.

If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.

View solution in original post

2 REPLIES 2
Highlighted

Re: Understanding how Nifi retrieves from Kakfa

What version of NiFi and what version of Kafka?

NIFi 1.x has GetKafka for Kafka 0.8, ConsumeKafka for Kafka 0.9, and ConsumeKafka_0_10 for Kafka 0.10. Whenever possible the matching processor should be used with the matching broker.

I believe all of them have some kind of property that controls the initial offset for the first time the processor is ever started, basically saying whether to start at the beginning of the topic, or at the latest offset. After that it is always going to use the last offset that the Kafka client has consumed in order to never miss data.

If you ever want to start back over at the current time I believe you can just change to a new consumer group id with Offset Reset set to latest.

View solution in original post

Highlighted

Re: Understanding how Nifi retrieves from Kakfa

Don't have an account?
Coming from Hortonworks? Activate your account here