Support Questions
Find answers, ask questions, and share your expertise

NiFi PutKafka and Verifying Published Messages

I created a small dataflow to test PutKafka. The flow seems to be running successfully, however I am trying to verify that I can read messages from the Kafka topic to which the messages were published.

I am trying to verify the topic contains messages using kafka-console-consumer.sh (I also tried verifying within NiFi using GetKafka but was unsuccessful). I let the process run for awhile but it doesn’t produce any output after the initial {metadata.broker.list = …} output. When I Ctrl-C after a while, it says no messages were processed.

5681-screen-shot-2016-07-08-at-52129-pm.png

5682-screen-shot-2016-07-08-at-52606-pm.png

2 REPLIES 2

I think I partially figured out what is happening, understanding now that the first time a consumer is initiated, it will default to the very end of the log, i.e., the offset is initialized to the highest value. When I use

/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper sandbox.hortonworks.com:2181 --topic cdr --from-beginning

I am able to retrieve the messages from the topic.

My question at this point is specific to the NiFi behavior. Is there a way to specify a property to emulate the --from-beginning behavior of kafka-console-consumer.sh?

In my case, I am starting the NiFi dataflow after the process that is generating data into the directory from which GetInput is pulling. In order to get the desired behavior, I changed Auto Offset Reset to 'smallest'.

My understanding (from this SO question) is that the auto.offset.reset configuration value is only relevant when the consumer group does not have a valid offset committed, say in ZK. I am curious about whether the GetKafka processor, for the same consumer group value, does store its offsets in ZK, affecting the behavior.