Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi CRON read mutliple times

avatar
Rising Star

I am using a GetHDFS Processor with CRON driven strategy : sheduled to run every day at 10am.

I have one input file to read but when the dataflow starts it gets the source file multiple times instead of 1 time (9 times in my case). Why?

As a result, when I write the output dataflow, I get the following warning : file with same name already exists

Should I modify the parameter Plling Interval ? (set to 0 sec by default)

1 ACCEPTED SOLUTION

avatar
@Raphaël MARY

What does your cron run schedule look like?

View solution in original post

7 REPLIES 7

avatar
@Raphaël MARY

What does your cron run schedule look like?

avatar
Rising Star

Run schedule : * * 10 * * ?

avatar

@Raphaël MARY

Try setting the cron run schedule to 0 0 10 * * ? instead.

The reason the other cron schedule grabbed the same file multiple times is because the * * for second and minutes meant run every second and every minute for that hour.

avatar
Super Mentor

Possibility to run every second or minute. In reality this means run as often as possible using the allowable number of concurrent tasks during the 10th hour of each day. I your case it sounds like it was able to run at least 10 times in that one hour.

avatar

Hi @Raphaël MARY,

Did you set a different value for number of concurrent tasks?

Are you in a cluster configuration?

avatar
Rising Star

No, only one node and 1 concurrent tasks.

I changed to 0 0 10 * * ? in order to specify minutes and seconds.

It is working now!

avatar
Super Mentor
@Raphaël MARY

If you are running a NiFi cluster, by default every node in your cluster will be running this getHDFS processor at 10 am each day. This means every node will be getting a copy of the same files and processing them in the same way.

If you are running a cluster, considering changing the configuration of your getHDFS processor so it runs on primary node only.