Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi GetFile processor is not reading file on daily basis

avatar
Explorer

I have a NiFi flow that reads file from local drive and writes in another folder. But due to some reason it is not reading files on regular basis. 

I can't find anything on the log. The log file is printed as if nothing was scheduled for that particular time. It's simply skipping the timing and continuing. I'm using Apache NiFi version 1.11.4

 

01 Sample_flow screenshot.png

Screenshot of the Flow.

 

Screenshot of Scheduling tabScreenshot of Scheduling tab

Screenshot of the GetFile Processor Scheduling tab. I have scheduled it for 08:05 on daily basis.

 

Screenshot of properties tabScreenshot of properties tab

Screenshot of GetFile Processor Properties tab.

 

This NiFi is hosted on Windows 2012 server and this problem is only occurring in production environment. Although all the configuration is same in all the environment. 

 

I'm also attaching the sample flow template in case if anyone want to try it out.

 

Sample_flow.xml

 

 

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.3">
    <description>sample_flow</description>
    <groupId>43743f1b-d683-1c4a-9aa3-f3ccafb344b7</groupId>
    <name>sample_flow</name>
    <snippet>
        <processors>
            <id>26792e99-d010-34b2-0000-000000000000</id>
            <parentGroupId>8ffe175c-fbc5-3eb9-0000-000000000000</parentGroupId>
            <position>
                <x>0.0</x>
                <y>0.0</y>
            </position>
            <bundle>
                <artifact>nifi-standard-nar</artifact>
                <group>org.apache.nifi</group>
                <version>1.11.4</version>
            </bundle>
            <config>
                <bulletinLevel>WARN</bulletinLevel>
                <comments></comments>
                <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
                <descriptors>
                    <entry>
                        <key>Input Directory</key>
                        <value>
                            <name>Input Directory</name>
                        </value>
                    </entry>
                    <entry>
                        <key>File Filter</key>
                        <value>
                            <name>File Filter</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Path Filter</key>
                        <value>
                            <name>Path Filter</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Batch Size</key>
                        <value>
                            <name>Batch Size</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Keep Source File</key>
                        <value>
                            <name>Keep Source File</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Recurse Subdirectories</key>
                        <value>
                            <name>Recurse Subdirectories</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Polling Interval</key>
                        <value>
                            <name>Polling Interval</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Ignore Hidden Files</key>
                        <value>
                            <name>Ignore Hidden Files</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Minimum File Age</key>
                        <value>
                            <name>Minimum File Age</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Maximum File Age</key>
                        <value>
                            <name>Maximum File Age</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Minimum File Size</key>
                        <value>
                            <name>Minimum File Size</name>
                        </value>
                    </entry>
                    <entry>
                        <key>Maximum File Size</key>
                        <value>
                            <name>Maximum File Size</name>
                        </value>
                    </entry>
                </descriptors>
                <executionNode>ALL</executionNode>
                <lossTolerant>false</lossTolerant>
                <penaltyDuration>30 sec</penaltyDuration>
                <properties>
                    <entry>
                        <key>Input Directory</key>
                        <value>D:\NiFi-Dev\NAS\Source4</value>
                    </entry>
                    <entry>
                        <key>File Filter</key>
                        <value>(?i)^(SAMPLE_FILE.TXT)$</value>
                    </entry>
                    <entry>
                        <key>Path Filter</key>
                    </entry>
                    <entry>
                        <key>Batch Size</key>
                        <value>10</value>
                    </entry>
                    <entry>
                        <key>Keep Source File</key>
                        <value>false</value>
                    </entry>
                    <entry>
                        <key>Recurse Subdirectories</key>
                        <value>true</value>
                    </entry>
                    <entry>
                        <key>Polling Interval</key>
                        <value>0 sec</value>
                    </entry>
                    <entry>
                        <key>Ignore Hidden Files</key>
                        <value>true</value>
                    </entry>
                    <entry>
                        <key>Minimum File Age</key>
                        <value>0 sec</value>
                    </entry>
                    <entry>
                        <key>Maximum File Age</key>
                    </entry>
                    <entry>
                        <key>Minimum File Size</key>
                        <value>0 B</value>
                    </entry>
                    <entry>
                        <key>Maximum File Size</key>
                    </entry>
                </properties>
                <runDurationMillis>0</runDurationMillis>
                <schedulingPeriod>0 05 8 1/1 * ? *</schedulingPeriod>
                <schedulingStrategy>CRON_DRIVEN</schedulingStrategy>
                <yieldDuration>1 sec</yieldDuration>
            </config>
            <executionNodeRestricted>false</executionNodeRestricted>
            <name>Read_NAS_File</name>
            <relationships>
                <autoTerminate>false</autoTerminate>
                <name>success</name>
            </relationships>
            <state>RUNNING</state>
            <style/>
            <type>org.apache.nifi.processors.standard.GetFile</type>
        </processors>
    </snippet>
    <timestamp>10/31/2022 19:02:29 IST</timestamp>
</template>

 

 

  

1 ACCEPTED SOLUTION

avatar
Super Mentor

@Bridewin 
I two things you may want to try....

1. GetFile processor was deprecated in favor of the newer ListFile --> FetchFile processors.  I'd recommend switching to these processors and see if you have the same observations.

2. I'd suggest enabling debug logging for the GetFile processor class to see what additional logging may show.  To do this, you would modify the logback.xml file in NiFi's conf directory. Add the below line down in this file where you see similar lines already.

<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>

  

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

View solution in original post

7 REPLIES 7

avatar
Super Mentor

@Bridewin 

Are all your environments using NAS storage location from which the GetFile is pulling files?
Have you monitored the health and connectivity of your NAS.  Since you have your GetFile only scheduled to execute once a day, if your NAS or network is having issues, it simply will return nothing for that days execution.  Since you are configured to remove the file you are consuming, have you tried to change yoru cron to run multiple times within the 8am hour to see if it gets picked up by any one of those executions?  Perhaps if you are having network issues occasionally impacting your NAS, this will resolve your issue with consuming the file.


If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Super Mentor

@Bridewin 
To add some additional context around your Cron schedule.  NiFi uses Quartz Cron in case you were not already aware.
Your current Quartz cron "0 05 8 1/1 * ? *" means that the processor will be scheduled to execute at 8:05am starting day 1 of every month and every subsequent day after day 1 in each month.

The issue with this cron is when you start your GetFile on any day other than the 1st prior to 8:05am.  
Let's say you start NiFi on November 3rd.  On startup NiFi loads your flow and starts all your component processors. In this configuration your GetFile will not get scheduled until December 1st and then at that point continue to execute everyday there after.  If you stop and start the processor even without a NiFi restart, the same would happen.  If NiFi restarts the JVM, same will happen. 

I am not clear on why you decided to add 1/1, perhaps this is how you intended for it to be scheduled?

To truly have it get scheduled at 8:05am everyday starting the very day the processor is started (whether via user action or NiFi JVM restart), you would want a cron like "0 5 8 * * ? *"

For more info on QuartZ Cron, review this link:
https://productresources.collibra.com/docs/collibra/latest/Content/Cron/co_quartz-cron-syntax.htm

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Explorer

@MattWho @Faerballert 

As per the given explanation i tried your solution that is i modified the cron to "0 5 8 ? * * *" but again. It worked for 2 days and then today again it didn't work.

avatar
Explorer

Dear Matt,

I'm reading the file from the folder NAS-Dev its just a folder in D drive it's not actually NAS location. I have ensured that the file is present at the location everyday. 

 

Regards

Bridewin

avatar
Super Mentor

@Bridewin 
I two things you may want to try....

1. GetFile processor was deprecated in favor of the newer ListFile --> FetchFile processors.  I'd recommend switching to these processors and see if you have the same observations.

2. I'd suggest enabling debug logging for the GetFile processor class to see what additional logging may show.  To do this, you would modify the logback.xml file in NiFi's conf directory. Add the below line down in this file where you see similar lines already.

<logger name="org.apache.nifi.processors.standard.GetFile" level="DEBUG"/>

  

If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped.

Thank you,

Matt

avatar
Expert Contributor

Hello,

your CRON-Expression means that it gets exectued at 08:05am, every day starting on the 1st, every month.

 

To execute the processor daily based on 08:05 am then you need to set 

0 5 8 ? * * *

 

Greetings

avatar
Explorer

I applied "0 5 8 ? * * *" this on cron and still the same. It worked for 2 days and 3rd day again it didn't work.