Member since
07-30-2019
3470
Posts
1642
Kudos Received
1018
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 301 | 05-06-2026 09:16 AM | |
| 491 | 05-04-2026 05:20 AM | |
| 355 | 05-01-2026 10:15 AM | |
| 524 | 03-23-2026 05:44 AM | |
| 394 | 02-18-2026 09:59 AM |
05-15-2026
09:15 AM
1 Kudo
@zzzz77 Still not clear on the entire workflow you have going on between your Windows NiFi and 3 node NiFi Cluster. 1. "where the acknowledgement files get stuck between the cluster and the client machine" - not clear in where exactly this means. Queued in some connection with the NiFi canvas between what two components? Can you share screenshots of your dataflow setup showing where they get stuck? Is back pressure being applied to any of the connections? 2. "We also observed files would "disappear" when we had 2 downstream sites connected to the same output port" - please elaborate here. It is common to have multiple RPGs connected to remote ports (if the RPG is on a 3 nodeNiFi cluster, then you have 3 RPGs attempting to pull data from the target Remote output port). No different then if you had three non clustered standalone NiFi instances all connected to same Remote OutPut Port (all are trying to connect constantly and get FlowFiles from the port). I have not observed FlowFiles "disappear" with RPG. Did you use NiFi's built in Provenance to search for a "disappeared" FlowFile using the lineage? I would recommend not building your flow around "Remote output ports". There is no good load distribution happening with "Remote Output ports". The "Remote Process Group" (RPG) is the client in all Site-To-Site transfers. When you add and configure a RPG, it will connect to the target the first Target URL configured and fetch the Site-to-Site (S2S) details for that target cluster. While you can configure multiple URLs in a comma separated list in the RPG, it only attempts the next configured URL if first is not reachable. Those S2S details contain info from the target NiFi instance/cluster (to include but not limited to: num nodes in cluster, support protocol, s2s ports, individual node load, etc). These details allow the RPG to create a distribution plan. lets assume target is a 3 node cluster and node one has 10,000 queued FlowFiles, node 2 has 5,000, and node 3 has 5,000 queued FlowFiles. Since node 1 has a higher load, the RPG would try to make sure nodes 2 and 3 got more FlowFile sent to them. So distribution might result in transfers don in order like this: (node 1, node 2, node 3, node 2, node 3, repeat from beginning). So you will notice with each iteration it send twice to nodes 2 and 3 and only once to node 1. The RPG has no round robin configuration. But even with above, there is no guarantee of any even or close to even load distribution. Under continuous dataflow load, you will see pretty good distribution, but the fewer FlowFiles the less distributed it can become. Below Article covers the settings that can help improve the distribution of FlowFile across nodes via RPG. https://community.cloudera.com/t5/Community-Articles/How-to-achieve-better-load-balancing-using-NiFi-s-Site-To/ta-p/246279 Form above article you learn what configurations exist when "sending" FlowFiles to a Remote Input Port. But Output Ports are different. The RPG (client) is still connected to the "output port" of your target NiFi instance or cluster. Lets say the RPG is on you 3 node cluster, so that means that each node in the 3 node cluster has its own copy of the RPG executing. So each node polls the output port to fetch FlowFiles. There are no controls to limit how much data 1 node's RPG may pull. It simply connects and pulls everything currently based on output port config settings on RPG, but there is no distribution model since it is a pull. So as soon as it finishes it will attempt again. So you have less control over over distribution when using Remote Output Ports. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-13-2026
05:03 AM
@nisaar I agree with @vafs that sharing the completed ERROR and stack trace is always gong to be most helpful in your community questions. Those full stack traces will have classes like "net.schmizz.sshj.transport" that you could try putting in to DEBUG within the NiFi logback to see what additional logging that class (not a NiFi library, but used by NiFi) may provide. Matt
... View more
05-13-2026
04:48 AM
@zzzz77 Sorry to hear you are having challenges with your dataflow. Can you clarify "site-to-site connection queue after the cluster" as this is not very clear. Are FlowFiles always stuck in same connection queue? Can you share a screenshot? Is this a connection to a Remote Process Group, Remote Output Port? Of a connection between a Remote Input Port and some other NiFi component (what is this component and how is it been configured? What is the configuration settings of the connection with the "stuck/delayed" FlowFiles? Thank you, Matt
... View more
05-06-2026
09:16 AM
@AlokKumar You'll want to use the invokeHTTP processor which can be configured to utilize an Oauth2 Access token provider: JWTBearerOAuth2AccessTokenProvider StandardOauth2AccessTokenProvider Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-04-2026
05:41 AM
@oka Perhaps others in the community may hav additional suggestions here, but since the "-" is not a valid character in JMS, you would need to use a AMQP processor to support these headers. As mentioned before there is a https://issues.apache.org/jira/browse/NIFI-14670 jira for adding AMQP 1.0 support to ConsumeAMQP processor, but it is still open and unassigned. Now that jira points to using the Qpid JMS Client in ConsumeJMS and as you experienced it works but still has limitations. Those limitations impact these specific properties with the "-" in the name. I would suggest adding your experience with trying to use Qpid AMQP in the above jira and what impacts it has on the two headers you require to maybe push the Apache community to adding AMQP 1.0 support to the AMQP specific processors. Additionally, there is this jira (https://issues.apache.org/jira/browse/QPID-4992), where an individual expressed some success preserving the content type header by using ActiveMQ JMS API instead of the Qpid AMQP JMS API. So you may want to give this jira a read and maybe try this for yourself. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-04-2026
05:20 AM
@nisaar I expect that retry set on the Success relationship out of ListSMB is impacting your scheduling. I suspect that retry is blocking "successful" attempts by until both retry have been made which aligns with the skip twice you are seeing in scheduling. As I mentioned before, you should not be setting "retry" on any success relationships. This is an anti-pattern that will delay processing of any successful execution for the duration of the retry (Each retry would occur at the processors scheduled execution times). Note: Scheduling of a NiFi component does not mean immediate execution. That execution depends on availability of threads to service the execution. NiFi has a "max timer driven thread count" configuration that establishes the thread pool from which all scheduled component threads come from. So things like number of running components, number of concurrent tasks set on a given component, CPU insensitive components, etc can impact when a "scheduled" component is given a thread from the thread pool to execute. I Thread pool does not impact scheduler unless the processor has been scheduled and never executes until within the 1.5 hours since it was initially scheduled (which I doubt in this case). The more logical cause because of the consistent behavior is the "retry" you have set on Success from ListSMB. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-01-2026
10:15 AM
@fnimi Yes a write lock is still created when you use version control. So you'll still deal with read request stacking up until all jetty threads are full resulting in the 503. The large commit has to be completed successfully before status within NiFi Ui can reflect the current version control status on the Process group. Version controlled process groups also remove parses the json to remove all sensitive properties. Version controlled dataflows can also contain parameter contexts if used and controller services they utilize making them even larger. Having 1000+ components in a single version-controlled Process Group is considered an anti-pattern in NiFi. It makes version control, deployments, and UI responsiveness incredibly slow. Version controlled flow are meant for easy reuse or re-deployment to other NiFi clusters. Version controlling such large flows reduce the reusability of them. Modularize your Flows (Nested Versioning) Instead of versioning the top-level Process Group that contains everything, break your flow into smaller, logical, nested Process Groups (e.g., 50–100 components each). Avoid versioning a PG within a PG that is already version controlled. You can version control these smaller Process Groups individually. This drastically reduces the serialization time, lock duration, and payload size sent to the Registry. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
05-01-2026
07:28 AM
@Former Member Do you observe same issue if you run it more frequently so it is constantly checking for new files to list rather then larger batches every 30 minutes? Matt
... View more
05-01-2026
07:24 AM
1 Kudo
@fnimi When you say "make a new version of", are you referring to committing a new version of a version controlled process group to the NiFi-Registry service or creating a new NiFi template? NiFi templates consume a lot of heap memory which is the leading reason why they were deprecated in Apache NiFi 1.x and fully removed in Apache NiFI 2.x major releases. Do each of these copied 4-5 big process groups contain 300 - 1000 processors or 300 - 1000 total. When you copy components on the canvas you are creating a flow snippet that contains all the components, configurations, connections, etc of what you have selected. That snippet is held in heap memory and when you paste it, it creates a request that contains that large snippet that must be replicated to all the nodes in a NiFi cluster. Also when you paste NiFi must iterate through all the components in that snippet to calculate and set new x,y coordinates, component UUIDs, remap connections because of UUID changes and instantiate those components on the canvas. Also keep in mind that even after all components are added from the paste operation they are then passed along to the validator to check if each has a valid configuration and referenced services are good state and enabled. This operation creates an exclusive write lock until the above completes. Meanwhile, your browser (and the browsers of any other users that may be open to this NiFi) continues to fire status update requests every so many seconds. These read requests are blocked waiting for the Write Lock to release. The Jetty service has a thread pool where these read request start stacking up until no additional can be accepted and thus 503 can be encountered. Also things like CPU thread concurrency, disk I/O, etc utilized during this large request ma be under contention. Instantiating flows from NiFi-Registry or via flow definitions imports is going to use more efficient methods then an in browser copy and paste. I would avoid copy and paste of large flows and focus on copy paste of smaller snippets at a time. Also want to note that Apache NiFi 1.19 is more then 3.5 years old and to keep up with critical CVEs and bug fixes you just plan regular upgrades to newer versions. The Apache NiFi 1.x major release branch is end of life and no longer receiving any new updates, fixes, or security CVE changes in Apache. The new Apache NIFi 2 major release branch is what is being supported in Apache now. Hope this helps explain your observations: Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
04-30-2026
05:16 AM
@nisaar Why are you posting the same response over and over again every few hours? It makes the thread unnecessarily noisy. Your issue is a timeout issue within the SMBJClientProviderService that is attempting to establish a connection with your target SMB share. This is taking longer then the configured timeout. The subsequent request is probably working because the connection is still established at time of second run. Same issue with the FetchSMB it appears as the connection is closed at time of attempting to fetch the File content. Have you tried increasing the Timeout setting in that controllers service from the default 5 secs to 20 seconds? Have you inspected your network latency between NiFi host and SMB server? Are you seeing any packet loss? Have you inspected the SMB server logs in Windows? Does the behavior change if you stop using the "run once" option and just start the processor? Your run schedule of every 30 seconds is allowing the connection to timeout. If you also adjust that run schedule to 10 seconds on ListSMB, do you see a change in behavior? Unfortunately I do not have a SMB share available to test this setup myself and am providing what guidance and suggestions I can to help with your community query. Please help our community grow. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more