Support Questions

Find answers, ask questions, and share your expertise

NiFi 1.9.1 GetSFTP throws FlowFileAccessException: Unable to create ContentClaim due to java.io.IOException

avatar
Explorer

The GetSFTP processor is throwing the FlowFileAccessException when trying to read a file from a remote directory.
If I look at the error stack trace I can see that at StreamUtils.copy is where IOException has happened.

Is it something like file was moved when actually reading it ?
Can you guys please help me understand why this issue occurring and how I can avoid it ?


o.a.nifi.processors.standard.GetSFTP GetSFTP[id=7d5fe4b5-7b67-3779-8de7-00e903c20c34] Unable to retrieve
file FILENAME due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.IOException: error: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.jcraft.jsch.ChannelSftp$2@6e1e8c5e for StandardFlowFileRecord[uuid
=a50179f7-b631-4aa8-9e74-a2dca44d89bd,claim=,offset=0,name=a50179f7-b631-4aa8-9e74-a2dca44d89bd,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.IOException: errororg.apache.nifi.processor.exception.FlowFileAccessException: Failed to import data from com.jcraft.jsch.ChannelSftp$2@6e1e8c5e for StandardFlowFileRecord[uuid=a50179f7-
b631-4aa8-9e74-a2dca44d89bd,claim=,offset=0,name=a50179f7-b631-4aa8-9e74-a2dca44d89bd,size=0] due to org.apache.nifi.processor.exception.FlowFileAccessException: Unable
to create ContentClaim due to java.io.IOException: error
at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:3045)
at org.apache.nifi.processors.standard.GetFileTransfer.onTrigger(GetFileTransfer.java:194)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.nifi.processor.exception.FlowFileAccessException: Unable to create ContentClaim due to java.io.IOException: error
at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:3038)
... 12 common frames omitted
Caused by: java.io.IOException: error
at com.jcraft.jsch.ChannelSftp$2.close(ChannelSftp.java:1532)
at com.jcraft.jsch.ChannelSftp$2.read(ChannelSftp.java:1444)
at com.jcraft.jsch.ChannelSftp$2.read(ChannelSftp.java:1364)
at org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:62)
at org.apache.nifi.stream.io.StreamUtils.copy(StreamUtils.java:35)
at org.apache.nifi.controller.repository.FileSystemRepository.importFrom(FileSystemRepository.java:744)
at org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:3035)
... 12 common frames omitted

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Umakanth 

 

Any chance you are running a NiFi cluster (multiple NiFi nodes) or you have multiple systems all trying to consume the same data from this same SFTP server?

It is possible that one host finished reading the file first and removed it before the other hosts could finsih reading the same file.

SFTP is not a cluster friendly protocol and if using this processor in a NiFi Cluster, this processor should be configured to execute on "primary node" only.  Otherwise all nodes in your cluster will be fighting to consume the same source files of which you can expect to see exceptions.

The GetSFTP processor is also a deprecated processor in favor of the newer listSFTP and FetchSFTP set of processors.  The newer processors allow you listSFTP (primary node only and produces 0 byte FlowFiles) ---> load balanced connection (balances FlowFiles across all nodes in cluster)--> FetchSFTP ( Execute on all nodes. Retrieves specific content per FlowFile).

Hope this helps,

Matt

View solution in original post

3 REPLIES 3

avatar
Master Mentor

@Umakanth 

 

Any chance you are running a NiFi cluster (multiple NiFi nodes) or you have multiple systems all trying to consume the same data from this same SFTP server?

It is possible that one host finished reading the file first and removed it before the other hosts could finsih reading the same file.

SFTP is not a cluster friendly protocol and if using this processor in a NiFi Cluster, this processor should be configured to execute on "primary node" only.  Otherwise all nodes in your cluster will be fighting to consume the same source files of which you can expect to see exceptions.

The GetSFTP processor is also a deprecated processor in favor of the newer listSFTP and FetchSFTP set of processors.  The newer processors allow you listSFTP (primary node only and produces 0 byte FlowFiles) ---> load balanced connection (balances FlowFiles across all nodes in cluster)--> FetchSFTP ( Execute on all nodes. Retrieves specific content per FlowFile).

Hope this helps,

Matt

avatar
Explorer

Hello @MattWho Thank you for your awesome response as always.

Yes our NiFi is in 5 node cluster and GetSFTP is configured to run only on primary node.

After further analysis we only could arrive at one conclusion, as soon as we pull the file the source system is moving the file to a different location.

Now, we are able to pick many files without exception but only for certain rare cases we are encountering this exception.

Not sure if this is because of network latency, could this be possible ? but the jsch library is throwing the exception.

Yes, since FetchSFTP provides us failure output we could add a retry if possible, so yes we are moving from GetSFTP to List and Fetch combo.

Thank you

avatar
Master Mentor

@Umakanth 

 

The GetSFTP processor actually creates a verbose listing of all Files form target SFTP for which it will be getting.  It then fetches all those files.  Unlike the ListSFTP processor, the getSFTP is an older deprecated processor that does not store state.  My guess here is that at times the listing is larger then other times or as you mentioned some occasional latency occurs resulting in enough time between creating that list and actually consuming the files, that the source system has moved the listed file before it is grabbed.  

In that case moving to the newer ListSFTP and FetchSFTP processors will help in handling that scenario.
The listing will list all the files it sees and the FetchSFTP will fetch the content for those that have not yet been moved by the source system.  The FetchSFTP will still throw an exception for each file it can not find still and route those to the not.found relationship which you can handle programmatically in your NiFi dataflow(s).

Thanks,

Matt