Support Questions

Find answers, ask questions, and share your expertise

How to control missing files in NiFI?

avatar
New Contributor

I want to implement file checking with NiFi. I have a directory where files appear once a day. NiFi parses data from this file and deletes it. I would like to receive a notification if the file does not appear in the directory within 24 hours. Is there a way to do this check with NiFi?

2 REPLIES 2

avatar
Master Mentor

@doora 

Welcome to the Cloudera Community!

I am not completely clear on the use case described in your query.

You have a directory (local directory on the NiFi host or remote directory) whee files are dropped daily. 

NiFi ingests these files, then parses the content of the file, then deletes the source file, and finally terminates the NiFi FlowFile.  Correct?
Are you using ListFile and FetchFile to pull these files into NiFi?

You want to somehow monitor if a file does not appear in the directory.  To do so implies some static naming of these daily files?  

You could accomplish this through a creative dataflow design if you know the names of the files you are expecting each day.

Perhaps using a GenerateFlowFile to create a 0 bytes FlowFile with the filename of the expected file to be fetched within 24 hours and an attribute that captures current time.  Configure this processor to run on a cron once a day.   This processor would then connect to a FetchFile processor that attempts to fetch that filename from the configured directory.  This processor has a not.found relationship which you could connect to a RouteOnAttribute processor which you could configure with two dynamic relationships (one that checks to see if current now minus now recorded by GenerateFlowFile is then 24 hours and another that check if it is greater then 24 hours).  The relationship for less then 24 hours would get routed back to FetchFile to check again,  The relationship for greater then 24 hours could be routed to perhaps a PutEmail processor to send out an email notifying that filename <xyz> was not found in the past 24 hours t which time you terminate this FlowFile since GenerateFlowFile via it's cron would create a new FlowFile to starting looking for this file in the next 24 hour time window.  I would recommend adjusting the run schedule on the RouteOnAttribute to run less often (maybe every 10 minutes) because leaving it at 0 secs will have you FlowFile rapidly looping between FetchFile and RouteOnAttribute until it expires (older then 24 hours) or is found.  This would lead to excessive unnecessary resource usage.

Please help our community grow and thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

avatar
Community Manager

@doora Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: