I am reading a file using GetFile. If I set KeepSourceFile attribute to true then it keep on reading the file so I reset it to false. After resetting it to false, it removes the file from source after reading it. Is there any way to move the file to some other location once it has been read by GetFile?
The FetchFile processor will do what you want to do. The GetFile processor does not have the option of moving a file after reading it.
Just set the Completion Strategy to Move and then specify the directory in the Move Destination Directory
I replaced the GetFile with FetchFile but its giving me a warning. Below is the warning:
Upstream connection is invalid because processor requires an upstream connection but currently it has no upstream connection.
Below is my data flow:
All Fetch type processors require a List type processor preceding them to provide the files to fetch. So your FetchFile processor needs a list of files to fetch. So, put a ListFile processor configured to use the same directory as your FetchFile processor and send the output from the ListFile processor to the FetchFile processor.
Here is a link to an article that uses the ListSFTP and GetSFTP processors to explain the method: List to Fetch example in NiFi
I have added ListFile and FetchFile processors in my flow but they are not fetching the file.
Below are the configuration for both of the files:
There is no errors or warnings in Nifi Log.
I have successfully moved the file to HDFS using ListFile and Fetch as per your suggestions. But still there is one problem, although List file and Fetch file are moving my file from source to HDFS but List File throws an error and below is the error detail:
WARN [Timer-Driven Process Thread-7] o.a.nifi.processors.standard.ListFile ListFile[id=1ae3923c-015c-1000-15e4-e29f5b75a0e0] Error collecting attributes for file /root/InputData/, message is Mount point not found
Other than this error, I need to ask one more thing; Is there any way to clear the listing of Listfile so that I can process the same file in future ?
What you want to do is to clear the state of the processor.
Right click on the ListFile processor, from the menu that pops up, select View state
Then you will see a new window, in the middle right side of the window is a link Clear state, click that and the file will be picked up again the next time you want to pull it, and there you go.
FYI, if you only want to pull the file once a day, you can configure the ListFile processor with a CRON driven Scheduling strategy and it will only run once per day.