Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reset ListSFTP state for entity on FetchSFTP failure

Highlighted

Reset ListSFTP state for entity on FetchSFTP failure

New Contributor

I am looking for a feedback mechanism to inform the ListSFTP processor that a transfer has failed so that it attempts to list the file again. This is an attempt for auto resolution when network issues prevent successful transfers. I was hoping to be able to use a combination of fetch/put distributed map cache and update attribute to clear the file.lastModifiedTimestamp attribute for failed flowfiles from the FetchSFTP but it seems that the distributed cache is meant only for migration of old NiFi releases.

 

Is there any flow that would accomplish what we are looking for? 

4 REPLIES 4
Highlighted

Re: Reset ListSFTP state for entity on FetchSFTP failure

New Contributor

Could we use an ExecuteProcessor to access the statemanager which can remove the state for a particular file that has failed. So upon FetchSFTP failure, send a flow file to an ExecuteProcessor to reach into the state and remove the entry for the file?

Re: Reset ListSFTP state for entity on FetchSFTP failure

Master Guru

@NickH 

 

If the listSFTP processor fails during listing, no FlowFiles should have been output and the state should not have been updated.  Are you seeing failure during the listSFTP processor execution?  

If you are seeing FlowFiles getting routed to one of the failure relationships from the FetchSFTP processor, you can always loop that connection back to the same FetchSFTP processor so another attempt is made to fetch the content for that FlowFile.

There currently does not exist a may to clear just a single cached entry from the DistributedMapCacheServer controller service.  I encourage you to open an Apache NiFi Jira for a new processor that can remove cache entries.

https://issues.apache.org/jira

 

You could try looking at this example for removing a cache entry via a script:
https://gist.github.com/ijokarumawak/14d560fec5a052b3a157b38a11955772  

 

Hope this helps,

Matt

Highlighted

Re: Reset ListSFTP state for entity on FetchSFTP failure

New Contributor

Appreciate the feedback. Just a little more context if it helps...

 

It's not that the ListSFTP fails but if the FetchSFTP fails to fetch what the ListSFTP provides, we aren't able to inform the ListSFTP that the file should be listed again. This may occurred due to a network outage during the transfer.

 

The problem with the retry approach for FetchSFTP is that if the file is actually removed from the slave, we don't want to try it again. 

 

What we're trying to accomplish is something similar to rsync for multiple slaves feeding into a master server. The files need to remain on the slaves and the master should always reflect what is on the slaves. If FetchSFTP fails, then we would have outdated files on the master server.

Highlighted

Re: Reset ListSFTP state for entity on FetchSFTP failure

Master Guru

@NickH 

 

The FetchSFTP processor has multiple different relationships.

For your use case of the file really not being there when the FetchSFTP tries to fetch the content, the expected outcome would be that the FlowFile is routed to the "not.found" relationship which you should auto-terminate,

If you encountered some sort of communications failure (network issue during Fetch), the FlowFile should have been routed to the "comms.failure" relationship which should be looped back on processor to try again.

The FetchSFTP also has a "permission.denied" relationship which you can perhaps handle via dataflow design as well. Perhaps sending an email alert?

Hope this helps,

Matt

Don't have an account?
Coming from Hortonworks? Activate your account here