<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Nifi Incremental ingest? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127213#M43355</link>
    <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;According to this &lt;A href="https://community.hortonworks.com/questions/103459/clarifications-on-state-management-within-nifi-pro.html" target="_blank"&gt;https://community.hortonworks.com/questions/103459/clarifications-on-state-management-within-nifi-pro.html&lt;/A&gt; and my research -&lt;/P&gt;&lt;P&gt;I understand that DistributedMapCache is not actually distributed and it runs on individual nodes. If the node running the server fails then the data is gone. Also, it is a cache server so has an eviction strategy, though it gives the option of persistence directory but that does not solve anytime availability problem. When we want to store some temporary state then it may be good but for long term persistent state we should rather rely on Zookeeper for its distributed nature. Unfortunately, I could not find any processor for putting data in Zookeeper. Other option would be to use database or distributed storage like HDFS, S3 etc.&lt;/P&gt;&lt;P&gt;Please correct me if I am wrong anywhere.&lt;/P&gt;&lt;P&gt;PS: I have the same case where I want to get the data from an API and wants to store the time upto which I have already requested the data.&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2017 17:13:40 GMT</pubDate>
    <dc:creator>harsh1</dc:creator>
    <dc:date>2017-07-26T17:13:40Z</dc:date>
    <item>
      <title>Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127209#M43351</link>
      <description>&lt;P&gt;I have some external rest apis that I have to query for data periodically using InvokeHTTP. I'd like to pass in the date as a query arg which I last extracted data to only retrieve the incremental changes. What are the best practices on how to do this with Nifi? Should I&lt;/P&gt;&lt;P&gt;* Use an external database table to update/query the last date?&lt;/P&gt;&lt;P&gt;* Is there a different built in mechanism I can use to accomplish this?&lt;/P&gt;&lt;P&gt;Currently, I'm just using  ${now():toNumber():minus(86400000):format('yyyy-MM-dd')} to get the last day's date and passing this in to the rest api, but this isn't a good way to do it because if my daily load fails one day then the next day I will skip it.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 01:19:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127209#M43351</guid>
      <dc:creator>frankmarit</dc:creator>
      <dc:date>2016-10-13T01:19:26Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127210#M43352</link>
      <description>&lt;P&gt;You could use the DistributedMapCache.&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.server.map.DistributedMapCacheServer/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/35223/distributedmapcacheclientservice-nifi-wecrawlerxml.html" target="_blank"&gt;https://community.hortonworks.com/questions/35223/distributedmapcacheclientservice-nifi-wecrawlerxml.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.FetchDistributedMapCache/index.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.FetchDistributedMapCache/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;That's pretty easy since you are just using a date.&lt;/P&gt;&lt;P&gt;&lt;A href="http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html" target="_blank"&gt;http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I also like storing that in HBase or an RDBMS or a small in-memory database like Redis, Ignite, Geode, but that's more work and another step.&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 03:09:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127210#M43352</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-10-13T03:09:52Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127211#M43353</link>
      <description>&lt;P&gt;Great! Thanks I will play with this. Is there a way to know when the whole workflow is complete? The last step in my workflow writes the data to a file, but it doesn't always come at once. Some items may be waiting in one of the queues or whatever. Suggestions?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 05:08:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127211#M43353</guid>
      <dc:creator>frankmarit</dc:creator>
      <dc:date>2016-10-13T05:08:50Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127212#M43354</link>
      <description>&lt;P&gt;hit refresh    look at data provenance&lt;/P&gt;&lt;P&gt;you can see numbers in queues if things are still processing&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 07:34:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127212#M43354</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2016-10-13T07:34:21Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127213#M43355</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;According to this &lt;A href="https://community.hortonworks.com/questions/103459/clarifications-on-state-management-within-nifi-pro.html" target="_blank"&gt;https://community.hortonworks.com/questions/103459/clarifications-on-state-management-within-nifi-pro.html&lt;/A&gt; and my research -&lt;/P&gt;&lt;P&gt;I understand that DistributedMapCache is not actually distributed and it runs on individual nodes. If the node running the server fails then the data is gone. Also, it is a cache server so has an eviction strategy, though it gives the option of persistence directory but that does not solve anytime availability problem. When we want to store some temporary state then it may be good but for long term persistent state we should rather rely on Zookeeper for its distributed nature. Unfortunately, I could not find any processor for putting data in Zookeeper. Other option would be to use database or distributed storage like HDFS, S3 etc.&lt;/P&gt;&lt;P&gt;Please correct me if I am wrong anywhere.&lt;/P&gt;&lt;P&gt;PS: I have the same case where I want to get the data from an API and wants to store the time upto which I have already requested the data.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 17:13:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127213#M43355</guid>
      <dc:creator>harsh1</dc:creator>
      <dc:date>2017-07-26T17:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127214#M43356</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/24201/harsh.html" nodeid="24201"&gt;@Harsh Choudhary&lt;/A&gt;Agreed. I came to the conclusion that the distributed map cache is too flakey to keep track of important things. We've seen it mysteriously fail several times and have since changed all our processes to use a database. &lt;/P&gt;</description>
      <pubDate>Thu, 27 Jul 2017 07:01:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127214#M43356</guid>
      <dc:creator>frankmarit</dc:creator>
      <dc:date>2017-07-27T07:01:52Z</dc:date>
    </item>
    <item>
      <title>Re: Nifi Incremental ingest?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127215#M43357</link>
      <description>&lt;P&gt;There has been a major upgrade to cache in Apache NiFi 1.4 and now you can use Redis!&lt;/P&gt;</description>
      <pubDate>Tue, 10 Oct 2017 01:15:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Nifi-Incremental-ingest/m-p/127215#M43357</guid>
      <dc:creator>TimothySpann</dc:creator>
      <dc:date>2017-10-10T01:15:17Z</dc:date>
    </item>
  </channel>
</rss>

