Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Solved Go to solution
Highlighted

NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

Hello,

I've NiFi (standalone instance 1.0.1) GetHDFS with this cron schedule - 0 30 0 * * ?

I want the processor to start at 12:30 AM daily; so, with the above schedule, the processor started at the expected time this morning and some files have been read, but it hasn't finished reading all the files; I had quite a few files to read in the directory yesterday and right now it still has 1200+ files left in the directory; I have the "Keep Source File" set to false, so it would/should delete the files as it reads; that shows the files left in the directory haven't been read by the processor;

My understanding is, with the above schedule, once GetHDFS starts, it should keep reading until all the files in the directory are exhausted; but I'm not understanding why some files are still left.

Please help, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

Thanks to @Bryan Bende, I needed to change the batch size property in GetHDFS, to read all files in the directory.

https://community.hortonworks.com/questions/108547/need-clarification-on-how-nifi-processors-run-wit...

View solution in original post

6 REPLIES 6

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Contributor

@Raj B This looks similar to NIFI-4069

As a workaround, please try and change the cron schedule to 0,30 30 0 * *. so that it runs twice in the same minute.

Let us know if that helps.

Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

@Shashank Chandhok the schedule change to "0,30 30 0 * * ?" helped to read few additional files, but many files still remain in the directory

Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Contributor

@Raj B

Please check the timestamps of the files remaining in the directory. If they are being added during the process run time. Or if the timestamp is older than the CRON runtime of the processor.

Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

@Shashank Chandhok actually, the files I'm trying to process are from the day before; in my directory path in GetHDFS processor, I'm using expression language to point to the directory that was created yesterday and the files in that directory are from yesterday. So when the CRON scheduler starts at 12:30 am, all files that would need to be processed should all be there already in that directory.

Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

Not sure why I need to schedule the GetHDFS processor to run continuously (I set to run every 15 seconds), but this schedule exhausts all files from the directory - 0/15 * * * * ?

In my case since I'm loading files the next day (GetHDFS directory path points to previous day's directory), this resolves the issue I was facing.

Highlighted

Re: NiFi's GetHDFS processor with Cron schedule not reading all files in the directory

Rising Star

Thanks to @Bryan Bende, I needed to change the batch size property in GetHDFS, to read all files in the directory.

https://community.hortonworks.com/questions/108547/need-clarification-on-how-nifi-processors-run-wit...

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here