Created on 03-19-2026 11:49 PM - edited 03-19-2026 11:51 PM
Hi
Iam using ListSMB which lists all the files correctly on the source system.
However, not all files are fetched using FetchSMB, some of the files are skipped with below error.
FetchSmb[id=03ab6b24-019d-1000-0000-000032e5cf1d] Could not fetch file SDM-Prod-Reports/EOS_TFC_Metrics//EOS_TFC_Metrics.csv.: java.io.IOException: Could not create session for share smb://ifiler-smb03:445/SFTP
- Caused by: com.hierynomus.smbj.common.SMBRuntimeException: com.hierynomus.protocol.transport.TransportException: java.util.concurrent.ExecutionException: com.hierynomus.smbj.common.SMBRuntimeException: java.util.concurrent.TimeoutException: Timeout expired
- Caused by: com.hierynomus.protocol.transport.TransportException: java.util.concurrent.ExecutionException: com.hierynomus.smbj.common.SMBRuntimeException: java.util.concurrent.TimeoutException: Timeout expired
- Caused by: java.util.concurrent.ExecutionException: com.hierynomus.smbj.common.SMBRuntimeException: java.util.concurrent.TimeoutException: Timeout expired
- Caused by: com.hierynomus.smbj.common.SMBRuntimeException: java.util.concurrent.TimeoutException: Timeout expired
- Caused by: java.util.concurrent.TimeoutException: Timeout expired
Current version:
Thanks
Created 03-23-2026 05:44 AM
@nisaar
The exception indicates the an initial connection issue resulting in a failing to complete the connection. This would be network or server side issue and not a client (ListSMB/FetchSMB) issue.
Usually the files listed and fetched are done by Primary node itselfThis statement is not clear. What does "Usually" mean. The ListSMB processor should be configured to only execute on the "Primary node" only to prevent multiple nodes in your NiFi cluster from listing the same files multiple times. If the ListSMB processor is configured for "primary node" execution and you are seeing FlowFile specific to this flow being listed on different nodes then the node that was elected as primary node is changing. I'd suggest taking a closer look at the logs or node events via the NiFi UI to see why the cluster coordinator role is changing nodes. Maybe you are experiencing some long stop the world Garbage Collection pauses (could lead to timed out connections). Maybe you Primary nodes Core load average is exceptionally high as well since you are not distributing the workload across all your nodes or you have concurrent tasks set to high.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-20-2026 06:05 AM
@nisaar
The ListSMB processor only fetches metadata about the files in the target SMB location. For each file found it creates a 0 byte NiFi FlowFile that includes a bunch of metadata that can be used to fetch the content later by the FetchSMB processor. The List<type> and Fetch<type> processors are used to make sure one node in a multi-node NiFi cluster si not doing all the heavy work. The List<type> processor would be configured to run on "Primary Node" only. The success relationship would be connected to the FetchSMB via a connection. That connection would the need to be configured to load balance the 0 Byte FlowFiles across all your NiFi nodes so that each could Fetch a fair share of the content and process a fair share of the workload of this dataflow.
What are the difference between the files that fail on content fetch versus those that are successful?
Have you tried increasing the timeout set in the SmbjClientProviderService used by the SMB processors? Try setting it to 60 seconds or higher to see if the failed files can successfully fetch the content from SMB.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-20-2026 11:07 AM
Thanks for your reply.
Files are around 100 MB to 200 MB.
Usually the files listed and fetched are done by Primary node itself.
One more observation is it's not always the same file that fails, the file that got failed fetching works fine sometimes.
So, my understanding is its nothing to do with file. session is timing out for some reason.
After adding timeout of 60 sec in SmbjClientProviderService got below error
FetchSmb[id=03ab6b24-019d-1000-0000-000032e5cf1d] Could not fetch file SDM-Prod-Reports/EOS_TFC_Metrics//EOS_TFC_Metrics.csv.: java.io.IOException: Could not create session for share smb://ifiler-smb03:445/SFTP
- Caused by: com.hierynomus.smbj.common.SMBRuntimeException: com.hierynomus.protocol.transport.TransportException: Cannot write SMB2_SESSION_SETUP with message id << 11 >> as transport is disconnected
- Caused by: com.hierynomus.protocol.transport.TransportException: Cannot write SMB2_SESSION_SETUP with message id << 11 >> as transport is disconnected
Thanks
Created 03-23-2026 05:44 AM
@nisaar
The exception indicates the an initial connection issue resulting in a failing to complete the connection. This would be network or server side issue and not a client (ListSMB/FetchSMB) issue.
Usually the files listed and fetched are done by Primary node itselfThis statement is not clear. What does "Usually" mean. The ListSMB processor should be configured to only execute on the "Primary node" only to prevent multiple nodes in your NiFi cluster from listing the same files multiple times. If the ListSMB processor is configured for "primary node" execution and you are seeing FlowFile specific to this flow being listed on different nodes then the node that was elected as primary node is changing. I'd suggest taking a closer look at the logs or node events via the NiFi UI to see why the cluster coordinator role is changing nodes. Maybe you are experiencing some long stop the world Garbage Collection pauses (could lead to timed out connections). Maybe you Primary nodes Core load average is exceptionally high as well since you are not distributing the workload across all your nodes or you have concurrent tasks set to high.
Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 03-31-2026 07:18 AM
Sorry for the delayed response.
We were able to kind of resolve the issue by adding a retry on ListSMB and FetchSMB processors.
Number of Attempts : 2
Retry Back Off Policy: Penalize
Retry maximum backoff period : 1 minute
To test the working we have scheduled it to run every 30 min.
However, we are observing that whenever a retry happens the scheduler won't run on scheduled time. Not sure how retry is affecting scheduler.
Thanks!