Support Questions
Find answers, ask questions, and share your expertise

NiFi PutKafka and Verifying Published Messages

I created a small dataflow to test PutKafka. The flow seems to be running successfully, however I am trying to verify that I can read messages from the Kafka topic to which the messages were published.

I am trying to verify the topic contains messages using (I also tried verifying within NiFi using GetKafka but was unsuccessful). I let the process run for awhile but it doesn’t produce any output after the initial { = …} output. When I Ctrl-C after a while, it says no messages were processed.




I think I partially figured out what is happening, understanding now that the first time a consumer is initiated, it will default to the very end of the log, i.e., the offset is initialized to the highest value. When I use

/usr/hdp/current/kafka-broker/bin/ --zookeeper --topic cdr --from-beginning

I am able to retrieve the messages from the topic.

My question at this point is specific to the NiFi behavior. Is there a way to specify a property to emulate the --from-beginning behavior of

In my case, I am starting the NiFi dataflow after the process that is generating data into the directory from which GetInput is pulling. In order to get the desired behavior, I changed Auto Offset Reset to 'smallest'.

My understanding (from this SO question) is that the auto.offset.reset configuration value is only relevant when the consumer group does not have a valid offset committed, say in ZK. I am curious about whether the GetKafka processor, for the same consumer group value, does store its offsets in ZK, affecting the behavior.