<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Offset handling in Apache NiFi consumeKafka in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/303112#M55060</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can i reset Kafka Offset from within NiFi, incase i want to reprocess all the messages within nifi for same consumer group?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Deepak&lt;/P&gt;</description>
    <pubDate>Mon, 21 Sep 2020 09:11:49 GMT</pubDate>
    <dc:creator>dvmishra</dc:creator>
    <dc:date>2020-09-21T09:11:49Z</dc:date>
    <item>
      <title>Offset handling in Apache NiFi consumeKafka</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116614#M55057</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;We are planning to use NiFi to consume data from Kafka, transform and store the same onto HDFS. We are using Kafka 0.10.0 version hence using consumeKafka. It works perfectly fine when all the properties are set correctly. However we have few questions and it will be great if someone can help with the answers.&lt;/P&gt;&lt;P&gt;1. How NiFi handles the Kafka offset, does it maintain anything by itself or depends upon default topic "__consumer_offsets"?&lt;/P&gt;&lt;P&gt;2. As we set "Offset reset" to "earliest" i.e. to start from the beginning in case there is no initial offset to Kafka or the offset is no longer valid. Apart from the "earliest/ latest/ none" is there any other mechanism to handle the offsets?&lt;/P&gt;&lt;P&gt;3. Can we parallelize NiFi execution? Can it run on multiple servers on a distributed fashion or we need to install it in every single server we want it to run? If it is running only on a single machine how to scale it? Where does NiFi executes its back-end jobs, is it on a local JVM or Jetty Server process or something else?&lt;/P&gt;&lt;P&gt;Thanking in advance!
Anshu&lt;/P&gt;</description>
      <pubDate>Tue, 21 Feb 2017 20:30:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116614#M55057</guid>
      <dc:creator>anshuman_ghosh</dc:creator>
      <dc:date>2017-02-21T20:30:46Z</dc:date>
    </item>
    <item>
      <title>Re: Offset handling in Apache NiFi consumeKafka</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116615#M55058</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/16111/anshumanghosh.html" nodeid="16111"&gt;@Anshuman Ghosh&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;It does not answer to all of your questions but you may want to have a look at:&lt;/P&gt;&lt;P&gt;&lt;A href="http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka" target="_blank"&gt;http://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The offset is handled by Kafka itself, in other words when NiFi consumes data from Kafka the offset is committed to Kafka and NiFi does not store it. That's why there is this property "Offset reset" in case there is no offset on Kafka's side. In this case you only have the values proposed by the processor. It corresponds to Kafka's 'auto.offset.reset' property. You may want to look at Kafka's documentation for more details.&lt;/P&gt;&lt;P&gt;&lt;A href="https://kafka.apache.org/documentation/#newconsumerconfigs" target="_blank"&gt;https://kafka.apache.org/documentation/#newconsumerconfigs&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Regarding scaling, the link I provided before should give you a good idea. But in short... NiFi does scale very well with Kafka, you can increase the number of threads running in the JVM (Jetty is not involved at all) to consume data from Kafka, but you can also install NiFi in cluster mode to have multiple nodes of NiFi consuming data (and even multiple threads for each one of the nodes of your cluster).&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Feb 2017 21:20:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116615#M55058</guid>
      <dc:creator>pvillard</dc:creator>
      <dc:date>2017-02-21T21:20:28Z</dc:date>
    </item>
    <item>
      <title>Re: Offset handling in Apache NiFi consumeKafka</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116616#M55059</link>
      <description>&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/5078/pvillard.html" nodeid="5078"&gt;@Pierre Villard&lt;/A&gt; I will surely have a look.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Feb 2017 00:11:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/116616#M55059</guid>
      <dc:creator>anshuman_ghosh</dc:creator>
      <dc:date>2017-02-22T00:11:54Z</dc:date>
    </item>
    <item>
      <title>Re: Offset handling in Apache NiFi consumeKafka</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/303112#M55060</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How can i reset Kafka Offset from within NiFi, incase i want to reprocess all the messages within nifi for same consumer group?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Deepak&lt;/P&gt;</description>
      <pubDate>Mon, 21 Sep 2020 09:11:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/303112#M55060</guid>
      <dc:creator>dvmishra</dc:creator>
      <dc:date>2020-09-21T09:11:49Z</dc:date>
    </item>
    <item>
      <title>Re: Offset handling in Apache NiFi consumeKafka</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/303132#M55061</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/30062"&gt;@dvmishra&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It always best to start a new thread/question rather than adding a new question to an existing thread that already has an excepted answer.&lt;BR /&gt;&lt;BR /&gt;As far as being able to reset the offset for a specific consumer group from within NiFi itself, this is not something that can be done via the ConsumeKafka processors.&amp;nbsp; The offset is not stored by NiFi.&amp;nbsp; Offsets for each consumer group are stored in Kafka. Would not make make much sense to build such an option in to a NiFi processor if it was possible. Every time the processor executes it would reset in that case which is probably not the desired outcome. There are numerous threads online around reseting the offset in Kafka you may want to explore. Here are a couple:&lt;BR /&gt;&lt;A href="https://gist.github.com/marwei/cd40657c481f94ebe273ecc16601674b" target="_blank"&gt;https://gist.github.com/marwei/cd40657c481f94ebe273ecc16601674b&lt;/A&gt;&lt;BR /&gt;&lt;A href="https://gist.github.com/mduhan/0e0a4b08694f50d8a646d2adf02542fc" target="_blank"&gt;https://gist.github.com/mduhan/0e0a4b08694f50d8a646d2adf02542fc&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;If you can figure out how to accomplish reset via a custom script of external command, NiFi does offer several script execution and command line execution processors.&amp;nbsp; You may be able to use these processors to execute your script to rest the offset in Kafka.&lt;BR /&gt;&lt;BR /&gt;Aside from above, you can change the "group id" (new consumer group) and change the "offset reset" to "earliest".&amp;nbsp; Then restart processor to start consuming topic form beginning again as a different consumer group.&lt;BR /&gt;&lt;BR /&gt;Hope this helps,&lt;BR /&gt;Matt&lt;/P&gt;</description>
      <pubDate>Mon, 21 Sep 2020 12:53:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Offset-handling-in-Apache-NiFi-consumeKafka/m-p/303132#M55061</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2020-09-21T12:53:57Z</dc:date>
    </item>
  </channel>
</rss>

