Created 10-12-2016 06:19 PM
I have some external rest apis that I have to query for data periodically using InvokeHTTP. I'd like to pass in the date as a query arg which I last extracted data to only retrieve the incremental changes. What are the best practices on how to do this with Nifi? Should I
* Use an external database table to update/query the last date?
* Is there a different built in mechanism I can use to accomplish this?
Currently, I'm just using ${now():toNumber():minus(86400000):format('yyyy-MM-dd')} to get the last day's date and passing this in to the rest api, but this isn't a good way to do it because if my daily load fails one day then the next day I will skip it.
Created 10-12-2016 08:09 PM
You could use the DistributedMapCache.
That's pretty easy since you are just using a date.
http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html
I also like storing that in HBase or an RDBMS or a small in-memory database like Redis, Ignite, Geode, but that's more work and another step.
Created 10-12-2016 08:09 PM
You could use the DistributedMapCache.
That's pretty easy since you are just using a date.
http://funnifi.blogspot.com/2016/04/inspecting-your-nifi.html
I also like storing that in HBase or an RDBMS or a small in-memory database like Redis, Ignite, Geode, but that's more work and another step.
Created 10-12-2016 10:08 PM
Great! Thanks I will play with this. Is there a way to know when the whole workflow is complete? The last step in my workflow writes the data to a file, but it doesn't always come at once. Some items may be waiting in one of the queues or whatever. Suggestions?
Created 10-13-2016 12:34 AM
hit refresh look at data provenance
you can see numbers in queues if things are still processing
Created 07-26-2017 10:13 AM
Hi
According to this https://community.hortonworks.com/questions/103459/clarifications-on-state-management-within-nifi-pr... and my research -
I understand that DistributedMapCache is not actually distributed and it runs on individual nodes. If the node running the server fails then the data is gone. Also, it is a cache server so has an eviction strategy, though it gives the option of persistence directory but that does not solve anytime availability problem. When we want to store some temporary state then it may be good but for long term persistent state we should rather rely on Zookeeper for its distributed nature. Unfortunately, I could not find any processor for putting data in Zookeeper. Other option would be to use database or distributed storage like HDFS, S3 etc.
Please correct me if I am wrong anywhere.
PS: I have the same case where I want to get the data from an API and wants to store the time upto which I have already requested the data.
Created 07-27-2017 12:01 AM
@Harsh ChoudharyAgreed. I came to the conclusion that the distributed map cache is too flakey to keep track of important things. We've seen it mysteriously fail several times and have since changed all our processes to use a database.
Created 10-09-2017 06:15 PM
There has been a major upgrade to cache in Apache NiFi 1.4 and now you can use Redis!