Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is Polling Interval, Max Select , Run Schedule attribute doing in GetSFTP processor in NiFi

What is Polling Interval, Max Select , Run Schedule attribute doing in GetSFTP processor in NiFi

Contributor

I little confused in understanding each one of them clearly. Can someone here help me in this.

For Example:

I have set following attribue:

Max Select:2

Run Schedule: 30 sec

Polling Interval: 0 sec

In the source directory I have many files (say 10000), and I am writing these files to HDFS

What would be output/expected behavior.

2 REPLIES 2

Re: What is Polling Interval, Max Select , Run Schedule attribute doing in GetSFTP processor in NiFi

@Pradhuman Gupta

Max Select - represents the maximum number of files pulled in a single connection, in your example it will get two files each time it runs times the number of concurrent tasks

Run Schedule - the amount of time to wait between each task of pulling files, in your example the processor will pul files every 30 seconds

Polling Interval - how long to wait between getting listings of new files

FYI, we refer to these as properties.

So for the example above, the processor will run the first time and get a listing with the 10,000 files and pull two of them, then it will wait 30, seconds and then pull two more files and so on. Basically, the processor will have to run 5,000 times to pull the 10,000 files, with a 30 second wait between tasks, it will take 4,999 x 30 seconds between tasks, so, it will take 149,970 seconds, 41.66 hours, to pull all of the files. Or, about 4 files/minute, 20 files/5 minutes. If you don't write any new files to the directory, then the polling interval could be set even higher. Also, the listing needs a concurrent task and the pulling need a concurrent task, so I would give the processor at least 2 concurrent tasks and reduce the time on the run schedule. Consider increasing the Max selects, at least to 100 the default, because that will be more efficient and faster.

Is there a reason you are pulling only four files per minute?

Re: What is Polling Interval, Max Select , Run Schedule attribute doing in GetSFTP processor in NiFi

@Pradhuman Gupta

Did this answer your questions or are you still unclear?