- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NiFi CRON read mutliple times
- Labels:
-
Apache NiFi
Created ‎04-20-2017 02:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using a GetHDFS Processor with CRON driven strategy : sheduled to run every day at 10am.
I have one input file to read but when the dataflow starts it gets the source file multiple times instead of 1 time (9 times in my case). Why?
As a result, when I write the output dataflow, I get the following warning : file with same name already exists
Should I modify the parameter Plling Interval ? (set to 0 sec by default)
Created ‎04-20-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎04-20-2017 02:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does your cron run schedule look like?
Created ‎04-20-2017 02:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Run schedule : * * 10 * * ?
Created ‎04-20-2017 03:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try setting the cron run schedule to 0 0 10 * * ? instead.
The reason the other cron schedule grabbed the same file multiple times is because the * * for second and minutes meant run every second and every minute for that hour.
Created ‎04-20-2017 03:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Possibility to run every second or minute. In reality this means run as often as possible using the allowable number of concurrent tasks during the 10th hour of each day. I your case it sounds like it was able to run at least 10 times in that one hour.
Created ‎04-20-2017 02:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Raphaël MARY,
Did you set a different value for number of concurrent tasks?
Are you in a cluster configuration?
Created ‎04-20-2017 03:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, only one node and 1 concurrent tasks.
I changed to 0 0 10 * * ? in order to specify minutes and seconds.
It is working now!
Created ‎04-20-2017 03:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are running a NiFi cluster, by default every node in your cluster will be running this getHDFS processor at 10 am each day. This means every node will be getting a copy of the same files and processing them in the same way.
If you are running a cluster, considering changing the configuration of your getHDFS processor so it runs on primary node only.
