- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
running from current date on Nifi
- Labels:
-
Apache NiFi
Created 02-09-2023 10:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello community. I have a flow in Apache Nifi that does ETL from a Mongo DB database to a MySQL database. This flow runs a few times a day. I would like to know if there is any way to set a date within the flow, which makes the second run of the day only bring new things, and not everything that had already come in the previous run.
Created 02-13-2023 07:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Aprendizado
The GetMongo processor (Assume this is what you are using) utilize a Mongo client library and not something custom written in NiFi. So limiting returns needs to be something that client library supports.
The good news is that Mongo "limit" which is exposed by the processor should work for your use case (never tried this myself).
Example us setting a Mongo limit based on time:
https://stackoverflow.com/questions/8835757/return-query-based-on-date
Now the GetMongo processor does support and inbound connection which means a source FlowFile could be used to trigger each execution of the GetMongo processor. The "limit" property in the GetMongo processor also supports NiFi Expression Language (NEL), which means that the limit could be set dynamically on the source trigger FlowFile and passed to the GetMongo on each execution.
This means that after a successful run, you would need to extract from your Mongo results the date from which next execution needs to start. You could write that date for example to a Distributed Map cache using putDistributedMapCache processor. Then at beginning of your dataflow use a GenerateFlowFile --> FetchDistributedMapCache --> updateAttribute --> GetMongo to retrieve latest date that needs to be put in the limits configuration you pass to the GetMongo processor for next execution. The GenerateFlowFile scheduling controls the execution of this flow, so configure cron to control how often it creates the FlowFile that triggers yorur dataflow.
Hopefully this gives you an idea of how you can accomplish your use case.
If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.
Thank you,
Matt