Created 12-07-2023 12:06 PM
I want to set up a Nifi flow that gets data from a public RSS feed and loads it into a data lake. This RSS feed updates irregularly and when it does update it overwrites previous content.
What processor(s) should I use to get data from the RSS feed (close to) when it has updated? Is it as simple as using InvokeHTTP repeatedly, checking for a change in output, then loading into data lake if the content differs from the previous invocation? Is there another way if I don't want to make the HTTP request so frequently?
Created 12-07-2023 01:49 PM
You can't listen for RSS, you have to call them since it's regular HTTP
https://github.com/tspannhw/FLaNK-TravelAdvisory
If there's no way to know when the page changes then you can't know.
you can read it once an hour and keep the entire results in a cache (like HBase) and if it doesn't change throw it away
Created 12-07-2023 01:49 PM
You can't listen for RSS, you have to call them since it's regular HTTP
https://github.com/tspannhw/FLaNK-TravelAdvisory
If there's no way to know when the page changes then you can't know.
you can read it once an hour and keep the entire results in a cache (like HBase) and if it doesn't change throw it away