Created on 11-14-2018 07:25 PM - edited 08-17-2019 04:42 PM
Hello! The stream below gets a get into an ftp and then distributes these files to a spark cluster, however, realize that PutSFTP 1 has just picked up 3 files out of the 7 that came out of the GetFTP processor. Please, what is the cause of this problem? Do I need to configure something in NIFI? Nifi is at its default setting. Each FTP file has 1.6 GB media packed. Nifi is installed on the master node of Spark, the cluster has 3 with 44 cores per in and 256 in memory. How should I proceed?
Thank you!
Created 11-15-2018 02:56 AM
The reason why PutSFTP1 processor hasn't got 4 files because of back pressure as you can see all queues in the above screenshot having RED color that means all processors queue are hit the max back pressure configured on the queues(by default 10000 flowfiles or 1GB file size).
-
To resolve this issue only use failure and reject relationships of `PutSFTP2` processor to it self and Auto Terminate the Success relationship of `PutSFTP2` processor.
-
GetFile processor doesn't store the state until what time the processor fetched the files so based on the schedule processor just get's the files from the directory configured.
If you want to pull the files incrementally from the directory then use
ListSFTP+FetchSFTP processor to incrementally fetch the files from SFTP server then use PutSFTP1 and PutSFTP2 processors
Created 11-20-2018 01:17 AM
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 11-20-2018 12:42 AM
Shu, it worked, thank you! By the way, is there any processor in nifi that only works with a timer? For example, on the first day of the week I need to run a python script, which processor can I use to be this timer? Thanks!
Created on 11-20-2018 01:17 AM - edited 08-17-2019 04:42 PM
All NiFi processors runs on either Cron Driven (or) Timer Driven, if you want to run python script first day of the week then use
ExecuteScript/ExecuteProcess/ExecuteStreamCommand..etc processors and change the scheduling strategy to Cron driven then use below cron format
0 0 0 ? * MON *
Now this processor triggers every week Monday at 12:00AM and to test out cron schedule use this link.
Refer to this and this link for more details regards to NiFi scheduling strategies.