About MattWho

MattWho · ‎04-08-2021

@John_Wise @TimA Let me make sure I understand exactly what change you are making. I have Process Groups (PG) that are version controlled in my NiFi Registry. I have both a NiFi 1.11.4 and NiFi 1.12.1 clusters setup. If I import a flow from registry and then modify the state (start, stop, disable, enable) of any processor, my PGs do not change to say local changes exist. The state of a processor does not track as a local change. I suspect some other local change is being made in addition to state change. If you right click on the PG and under "Version" from displayed context menu select "show local changes" what are the tracked changes being reported? Hope this helps, Matt

MattWho · ‎04-08-2021

@AnkushKoul Since you only have 1 concurrent task configured, while that concurrent task thread is in use, another thread can not be started. So even with a runs schedule of 0 secs, another task can't start until the thread tied to that concurrent task is released making it possible for another execution to happen. At 30 secs it will only be allowed to execute again 30 secs later if there is an available concurrent task not in use already on the processor. Setting 30 seconds can create an artificial delay in your dataflow when tasks takes less than 30 seconds to complete. Note: While the processor is executing a task you will see a small number displayed in the upper right corner of the processor.

MattWho · ‎04-08-2021

@ram_g @Magudeswaran Guaranteeing order in NiFi can be challenging. As far as the prioritizers on the connection go: FirstInFirstOutPrioritizer: Given two FlowFiles, the one that reached the connection first will be processed first. This looks at timestamp recorded for FlowFile when the FlowFile entered this connection. In your case, you have a custom processor that takes in 1 FlowFile and may output 1 or more FlowFiles. Typically with such processors all output FlowFiles are committed to the downstream connection at the same time which makes using this prioritizer a challenge if that is the case. But generally processors that produce multiple FlowFiles from a single FlowFile also set FlowFile attributes that identify the fragments. Take a look at the attributes written by the SplitRecord processor as an example. OldestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is oldest in the dataflow will be processed first. 'This is the default scheme that is used if no prioritizers are selected'. This looks at the FlowFile creation timestamp. In your case, you have a custom processor that takes in 1 FlowFile and may output 1 or more FlowFiles. Are all output FlowFiles created as new? Now you may want to look at the following prioritizer: PriorityAttributePrioritizer: Given two FlowFiles, an attribute called “priority” will be extracted. The one that has the lowest priority value will be processed first. Note that an UpdateAttribute processor should be used to add the "priority" attribute to the FlowFiles before they reach a connection that has this prioritizer set. If only one has that attribute it will go first. Values for the "priority" attribute can be alphanumeric, where "a" will come before "z" and "1" before "9" If "priority" attribute cannot be parsed as a long, unicode string ordering will be used. For example: "99" and "100" will be ordered so the flowfile with "99" comes first, but "A-99" and "A-100" will sort so the flowfile with "A-100" comes first. Assuming your custom processor writes some unique attribute(s) to the FlowFiles it outputs, you may be able to use those attributes to enforce ordering downstream via above prioritizer. *** Also keep in mind that NiFi connection are "soft" limits. If your ere to set backpressure object threshold on connection outbound from your custom processor to 1 and on execution of your processor it produced 6 FlowFiles, they would all get committed to that connection. Only then does backpressure kick in and prevent your custom processor from being scheduled again until queue drops to below the backpressure threshold again. This is a good way of making sure only one "batch" of FlowFiles lands in the downstream connection at a time, but will not help enforce the order of the FlowFiles in that batch. Hope this helps, Matt

MattWho · ‎04-08-2021

@AnkushKoul By only having 1 concurrent task configured, you are affectively forcing that task to complete before the next can execute. With your RunSchedule set to "30 sec" NiFi will only schedule this component to execute every 30 seconds. So if task1 takes only 20 seconds to complete, task 2 would not get started until 10 seconds later. If you set RunSchedule to default 0 secs, that tells NiFi to schedule this component to execute as often as possible. So as soon as task 1 completes task 2 will then execute. You can think of concurrent tasks as a way to parallelize execution within a single component. So instead of having two processors you have one with 2 concurrent tasks. Each task gets schedule independent (parallel) of the other concurrent task(s). Each concurrent task will work on different FlowFile(s) from inbound connection(s). Some components will not support multiple concurrent tasks (the component source code would limit it to 1) So to me it sounds like you want tasks to kick off as fast as possible one after another. IN that case leave RunSchedule at 0 secs and concurrent tasks to 1. If you found this answer addressed your question, please take a moment to accept the answer. Hope this helps, Matt

MattWho · ‎04-05-2021

@Masi There have been many bug fixes in NiFi 1.9, 1.10, and 1.11 to LB connections. No particular bug comes to mind that explains what you are seeing. Is you NiFi secured? If so are you have authorization or SSL exceptions in your NiFi logs that may explain issue with 1 node sending FlowFiles to another node?

MattWho · ‎04-01-2021

@nmargosian The swap file in questions would contain FlowFiles that belong to a connection with the UUID of: 7cde3c5c-016b-1000-0000-00004c82c4b2 From your Flow Configuration history found under global menu icon in upper right corner, can you search for that UUID to see fi there is any history on it? - Do you see int existing at some point in time? Do you see a "Remove" event on it? - If you see it in history, but there is no "Remove" action, but it is now gone, then the flow.xml.gz loaded on restart did not have this connection in it. If this connection no longer exists in the canvas, NiFi can not swap these FlowFiles back in. Everything you see on the canvas resides in heap memory and is also written to disk within a flow.xml.gz file. When you stop and start or restart NiFi, NiFi loads the flow back in to heap memory from the flow.xml.gz (each node has a copy of this flow.xml.gz and all nodes must have matching flow.xml.gz files or nodes will not rejoin the cluster. Things I suggest you verify... 1. Make sure that NiFi can successfully write to the directory where the flow.xml.gz file is located. Make a change on the canvas am verify the existing flow.xml.gz was moved to the archive directory and a new flow.xml.gz was created. If this process fails then when NiFi is restarted any changes you made would be lost. For example the connection was created and data was queued on it, but NiFi failed to write new flow.xml.gz because it could not archive current flow.xml.gz (space issues, permissions/ownership issues...etc). This would block NiFi from creating a new flow.xml.gz, but the flow in memory would have your current flow still. All these directories and files should be owned and readable/writable by your NiFi service user. 2. Did some point in history did your cluster nodes fllows mismatch. For example, a change was made on the canvas of a node that was currently disconnected from the cluster. Then that nodes flow was copied to the other nodes to make all nodes in sync. 3. Was an archived flow reloaded back to NiFi at some point. This requires manual user action to copy a flow.xml.gz out of archive and used to replace the existing flow.xml.gz. NiFi restarts will not just remove connections from your dataflows. Some other condition occurred and it may not have even been recent. If you hav enough app.log history covering multiple restarts, do you see this same exact warn log line with each of those restarts. Hope this helps, Matt

MattWho · ‎03-29-2021

@Garyy You are correct. Since NiFi does not use sessions as mentioned in my last response, the client must authenticate every action performed. When you "login" to NiFi, the result is a bearer token being issued to the user which your browser stores and reuses in all subsequent request to the NiFi endpoints. At the same time a server side token for your user is also stored on the specific NIFi node you logged in to. The configuration in your NiFi login provider dictates how long those bearer tokens are good for. With your setting of 1 hour, you would be forced to re-login again every hour. Thanks, Matt

MattWho · ‎03-29-2021

@vi The more details you provide, the more likely you are to get responses in the community. Since i know you are dealing with GetFTP and files being consumed by that processor eating away at your limited network bandwidth, I can offer the following feedback: I assume the ~60 GB of files consumed by your GetFTP every hour is many files? The GetSFTP processor is deprecated in favor of the ListSFTP --> FetchSFTP processor design. SFTP protocol is not a cluster friendly protocol for a NiFi cluster (and you should always have a NiFi cluster for redundancy and load handling). Running the GetSFTP or ListSFTP on all nodes in the cluster would result in every node competing fo the same files. These processor would always be scheduled for "primary node" only (primary node option does not exist in a standalone NiFi setup). The ListSFTP processor does not return the content of the listed files from the SFTP processor. It simply generates a list of files that need to be fetched from the target SFTP server. Each of those listed files becomes its own FlowFile in NiFi. The ListSFTP is then connected to a FetchSFTP processor which will fetch the content for each of the FlowFiles produced by the ListSFTP. The connection between the ListSFTP and FetchSFTP processor would be configured to load balance the FlowFiles to all nodes in your cluster. This spread out the work load of returning that content across all your cluster nodes. While there is not configuration option in the GetSFTP or FetchSFTP processor to limit bandwidth (feel free to open an apache NiFi Jira in the community for such an improvement), the listSFTP to FetchSFTP processor does give you some control. You can configure the run schedule on the FetchSFTP to some value other then default 0 secs (which means run as often as possible) to some other value which would place a pause between each execution (between each FlowFile fetching its content). While the fetch of the Content will still happen as fast as allowed, this would place a break between each fetch giving other operations time on your constrained network. Hope this helps, Matt

MattWho · ‎03-29-2021

@nmargosian If you search your NiFi canvas for uuid: 7cde3c5c-016b-1000-0000-00004c82c4b2 Do you find that connection? This is the connection that this swap file would get swapped back in to. If this connection does not exist, then the swap file cannot be loaded back in to it. Any chance someone removed a connection from the canvas while this node was not connected to the cluster? Did you recently upgrade from an older NIFi version? Did you copy a flow.xml.gz from a different node in your cluster to this node because of a flow mismatch exception? Just looking for reason as to why this connection would be missing. Does the NiFi flow archive directory exist? Does the NiFi service user have proper permissions to read and write to that archive directory? Does NiFi have proper ownership and permissions to write to the flow.xml.gz file? When you make a change on the canvas, NiFi makes that change in the in memory flow, archives the current flow.xml.gz and then writes a new flow.xml.gz. I am wondering if perhaps the above connection was added to the canvas and flow enabled, but for some reason was unable to write out a new copy of the in memory flow to a flow.xml.gz. On NiFi restart, the flow from the flow.xml.gz is what is loaded back in to memory. Hope this helps, Matt

MattWho · ‎03-24-2021

@Garyy So the first place you may want to start is opening developer tools in your browser and then trying to connect your NiFi UI and take note of what calls are taking the longest to return and which it eventually times out on. You may also want to enable garbage collection logging within your NiFi JVM (do this by adding GC logging java args in the NiFI bootstrap.conf file). If your JVM is encountering long and/or frequent GC (all GC events are stop-the-world events), this can result in timeouts made to the UI. It is also not entirely clear to me what you mean by your UI access get terminated. NiFi does not use sessions. What are you observing? Can you provide more detail and screenshots? Matt

Online	Offline
Last Visited	‎12-26-2025 02:55 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎12-26-2025 02:55 PM
Posts	3,406
Kudos received	1618

Cloudera Community

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: How to elevate a default nifi user to admin - ...

Re: NiFi EnvokeHTTP - putting current date on HTTP...

Re: Invoking Nifi rest api in Data Flow

Re: Can I configure NiFi/Registry to not track p...

Re: Run Schedule - New task execution while previo...

Re: Ensuring of order of flow files in Nifi

Re: Run Schedule - New task execution while previo...

Re: NiFi-1.10- Load Balancer

Re: Apparent Data Loss on Nifi Restart

Re: NiFi web UI timeouts

Re: Limit network bandwidth-Apache NiFi

Re: Apparent Data Loss on Nifi Restart

Re: NiFi web UI timeouts