Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Apache Nifi - Distribution Trouble in Cluster Spark

Hello! The stream below gets a get into an ftp and then distributes these files to a spark cluster, however, realize that PutSFTP 1 has just picked up 3 files out of the 7 that came out of the GetFTP processor. Please, what is the cause of this problem? Do I need to configure something in NIFI? Nifi is at its default setting. Each FTP file has 1.6 GB media packed. Nifi is installed on the master node of Spark, the cluster has 3 with 44 cores per in and 256 in memory. How should I proceed?

Thank you!



Super Guru

@Julio Gazeta

The reason why PutSFTP1 processor hasn't got 4 files because of back pressure as you can see all queues in the above screenshot having RED color that means all processors queue are hit the max back pressure configured on the queues(by default 10000 flowfiles or 1GB file size).


To resolve this issue only use failure and reject relationships of `PutSFTP2` processor to it self and Auto Terminate the Success relationship of `PutSFTP2` processor.

GetFile processor doesn't store the state until what time the processor fetched the files so based on the schedule processor just get's the files from the directory configured.

If you want to pull the files incrementally from the directory then use

ListSFTP+FetchSFTP processor to incrementally fetch the files from SFTP server then use PutSFTP1 and PutSFTP2 processors

Super Guru
@Julio Gazeta

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu, it worked, thank you! By the way, is there any processor in nifi that only works with a timer? For example, on the first day of the week I need to run a python script, which processor can I use to be this timer? Thanks!

Super Guru
@Julio Gazeta

All NiFi processors runs on either Cron Driven (or) Timer Driven, if you want to run python script first day of the week then use

ExecuteScript/ExecuteProcess/ExecuteStreamCommand..etc processors and change the scheduling strategy to Cron driven then use below cron format

0 0 0 ? * MON *


Now this processor triggers every week Monday at 12:00AM and to test out cron schedule use this link.

Refer to this and this link for more details regards to NiFi scheduling strategies.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.