Created 10-06-2021 12:30 AM
How can we make the PutFile Nifi processor execute only at some specified date and time?
Created 10-06-2021 05:46 AM
This can be done with CRON if option is available, otherwise you can automate the process using python and nipyapi.
Created 10-06-2021 10:51 PM
Thanks for the solution.
Created 10-06-2021 05:46 AM
NiFi processors support "Timer Driven" and "Cron Driven" Scheduling Strategies.
There is a third option on some processors which is Event Driven that should not be used. It was created long ago and considered experimental. It is has since been deprecated due to improvement made in the Timer Driven strategy. It only remains in NiFi to avoid breaking flows of those who use it when they upgrade.
[1] https://community.cloudera.com/t5/forums/replypage/board-id/Questions/message-id/229905
If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.
Thank you,
Matt
Created 10-06-2021 10:50 PM
Thanks for the detailed solution. But I have more than thousands of flowfiles as input to PutFile processor and the same must be processed in a given future date and time. Hence kindly request you to give suggestion on how to handle this situation for the same requirement if possible.
Created 10-07-2021 04:47 AM
This can be done with nipyapi, a python library. Check the documentation Nipyapi
Created 10-07-2021 05:47 AM
Perhaps I don't understand your use case.
Are you saying you have a NiFi dataflow that slowly ingested Data producing FlowFiles that work their way through your dataflow to this putFile processor?
Then you want these 1000s of FlowFiles to queue up so that they can all be put to the local file system directory at the same time?
So what is being suggested by @m_adeel is to use the NIPYAPI to automated the starting and stopping of the putFile processor at a given time. You could also do the same through NiFi REST_API calls. You would still have the challenge of when to stop it.
Does the source of data ever stop coming in?
Would you be able to put all the FlowFiles from the inbound connection queue to disk before more source FlowFiles started flowing in to the queue?
Why the need to do this at a specific data and time?
Thanks,
Matt
Created 10-07-2021 11:44 PM
Yes @MattWho , you understood it correctly.
Let me tell you that the files in the source folder are there from the start and no more files are put in the source folder after the Nifi flow processing starts.
The files from the source folder need to be processed by PutFile processor in some given future specifed date and time as required by the client.
Created 10-08-2021 11:06 AM
@Ankit13
How do you know no more files we will be put after the NiFi flow processing starts?
To me in sound like the PutFile should execute at default 0 secs (as fast at it can run) and you should instead control this dataflow at the beginning were you consume the data.
For example:
In a 24 hour window data is being written to source directory to be consumed from between 00:00:00 and 16:00:00. Then you want to write that data to target directory starting at 17:00. So you instead setup a cron on a listFile processor to consume list the files at 17:00 and 17:01 and then have a FetchFile and PutFile running all the time so these immediately consume all the content for the listed files and write them to target directory. Then your listFile does not execute again until same time next day or whatever you cron is. This way the files are all listed at same time and the putFile can execute for as long as needed to write all those files to the target directory.
Hope this helps,
Matt
Created 11-02-2021 05:48 AM
@Ankit13
My recommendation would be to only automate the enabling/disabling and starting/stopping of the NiFi processor component that is ingesting the data in to your NiFi dataflow and leave all downstream processors always running, so that any data that is ingested to your dataflow has every opportunity to be processed through your dataflow to the end. When a "running" processor is schedule to execute, but has no FlowFiles queued in its inbound connection(s), it is pauses instead of running immediately over and over again to prevent excessive CPU usage, so it is safe to leave these downstream components running all the time.
Thank you,
Matt