About MattWho

MattWho · ‎07-05-2017

@Bharadwaj Bhimavarapu General guidance here is these values should be set to 2 times the number of available cores and no more then 4 times the number of available cores on a single instance of NiFi. If you are running a NiFi cluster, these values are enforced per node. So a setting of 16 in a 4 node cluster equates to a total of 64 threads cross the cluster. Setting values to high just results in many more threads in cpu wait and will not help performance at all. Beyond increasing these value you need to be mindful of how many concurrent task you assign each of your processors. Some processor are more cpu intensive then others (meaning they take longer to complete a job holding the thread much longer). You can look at the "tasks/time =: stats on a processor to see if it thread are long or short running. For processors that have long running threads you want to be extra careful on how many concurrent tasks you assign them. Thanks, Matt

MattWho · ‎07-05-2017

@M R Using a single partition in Kafka is similar to using a single Node in NiFi to maintain order. The enforceOrder processor is a great new addition for enforcing order of FlowFiles, but it will only enforce orders on those FlowFiles that reside on the same node in a NiFi cluster. So if you are trying to enforce processing order of FlowFiles across numerous nodes, this processor will not do that. You would need to get all FlowFiles for which you want to enforce order on to the same node before using this processor. I don't fully understand you entire use case, but a couple other processor you may want to look at include wait ad notify processors. These are also new in the latest HDF 3.0 and NiFi 1.2.0 releases. Thanks, Matt

MattWho · ‎07-05-2017

@Greg Keys The only destination processor component that would affect the emptying of a queue would be the processor that connection is attached to. Which processor type is this connection you are trying to empty attached to? Does this processor show any active threads in the upper right corner? It may take some thread dump analysis to determine why the particular processor is not releasing its threads, if some configuration issue is not obvious. If you restart NiFi, you are likely to get stuck in the same state again because this downstream processor is likely to run before you get access to the NiFi UI to stop it before it runs. There are a couple things you do to get around this: 1. Try setting FlowFile expiration on the connection to "1 sec" this allows the controller to handle the deletion of FlowFiles from the queue for you. This is of course if NiFi will allow you to edit the connection while the downstream component is till running. 2. The more likely successful operation is to shutdown NiFi and change the "autoresume state" NiFi configuration in the nifi.properties file from true to false. On restart all components will come up stopped. This will allow you to right click on the connection in question and empty it. This will also make sure the downstream processor is in a completely stopped state so its configuration can be changed. Don't forget to change autoresume state back to true after making you changes or every time NiFi is restarted everything will come up stopped. Thanks, Matt

MattWho · ‎07-05-2017

@J. D. Bacolod I like the idea of creating an attribute of FlowFiles that are routed to a failure relationship that will identify which component routed that FlowFile. I suggest opening an Apache Jira for this enhancement. For now you can use NiFi's data provenance capability to get the lineage of FlowFile that was processed by your dataflows. Lineage can be used to show all routing and processing done for a given FlowFile. It will not provide details on the reason the FlowFile was routed to failure. Once you have the timestamp of the failure event, you can look up the details in your nifi-app.logs. Thanks, Matt

MattWho · ‎06-30-2017

@Bharadwaj Bhimavarapu Along the menu bar at the top of the NiFi UI is there is a field that shows the current number of active threads in your NiFi. Is this a standalone NIFi install or a multi-node NIFi cluster? Under "Controller Settings" found within the hamburger menu in the upper right corner of the UI, What do you have configured for "Maximum Timer Driven Thread Count" and " Maximum Event Driven Thread Count"? I am wondering if you have some other processors in your NIFi holding all your available threads so this processor cannot get one. Thanks, Matt

MattWho · ‎06-30-2017

@Greg Keys What is your downstream processor component it is waiting for?

MattWho · ‎06-30-2017

@Bharadwaj Bhimavarapu Can you also share a screenshot of your GenerateFlowFile "scheduling" and "settings" tabs as well. Is this the only flow on your graph? Anything odd in the nifi-app.log when you start the processor?

MattWho · ‎06-29-2017

NiFi allows you to specify multiple of the following: - Content repositories directories - Provenance repositories directories - NiFi lib directories - Variable registry files Having multiple of any of these does not mean cloning of any data I going on. Thanks, Matt

MattWho · ‎06-29-2017

@Kuldeep Kulkarni No you can not specify more then one local state directory. There is not much in the way of local state that is store by NiFi, so i am not sure the use case for needing more then one. Thanks, Matt

MattWho · ‎06-26-2017

The NiFi S2S protocol is used by NiFi's Remote Process Group (RPG) components to distribute FlowFiles from one NiFi instance to another. When the target NiFi is a NiFi cluster, load-balancing of the FlowFie delivery is done across all nodes in the target NiFi cluster. The default way this works (and the only way it works in versions of NiFi previous to Apache NiFi 1.2.0 or HDF 3.0) is as follows: The RPG regularly communicates with the target NiFi cluster to get load status information about each node in the cluster. This information includes the number of currently connected nodes in the target cluster, each node's hostname, port information, and the number of total queued FlowFiles on each target NiFi node. The Source NiFi uses this information to determine a data distribution strategy for its source FlowFiles it has queued. - - - Let's assume a 4 node target NiFi cluster all reporting a zero queue count. - Each node will then be scheduled to receive 25% of the data. This means a distribution pattern of node1, node2, node3, and then, node4. - Now let's assume the same 4 node target cluster; however, node 1 and node 2 report having a queue of FlowFiles that results in the following: - Node 1 and Node 2 would get 16.67% of the data while node 3 and node 4 get 33.33% of the data. This results in a distribution pattern of node1, node2, node3, node4, node3, and then node 4. So Nodes 3 and 4 get twice the opportunity to receive data over nodes 1 and 2. Once the distribution pattern is determined, the RPG connects to the first node and starts transferring data from the incoming queue to that node for 500 milliseconds or until the queue is empty. The next run of RPG will start sending to the next node in pattern and so on. As you can see by this default distribution model, the data may not always be distributed as desired. The reason this transfer was implemented this way was for performance reasons. However, when working with very small FlowFiles, where FlowFiles come in a wide range of sizes from small to large, when a better network connection exists between one target node than another, or data comes in bursts instead of continuous flow, the load-balancing will be less than ideal. With the introduction of HDF 3.0 (Apache NiFi 1.2.0). additional configuration options were added to the RPG to control the number of FlowFiles (count), amount of data (size), and/or length of transaction time (duration) per RPG port connection. This gives users the ability to fine-tune their RPG connection to achieve better load-balancing results when dealing with lighter volume dataflows, network performance differences between nodes, etc. These new configuration options can be set as follows: Each input and output port configuration will need to be set individually. Of course, setting count to a value of 1 sounds like a good way to achieve really good load-balancing, but it will cost you in performance since only one FlowFile will be sent in each transaction. So, there will be extra overhead introduced due to the volume of new connections being opened and closed. So you may find yourself playing around with these settings to achieve your desired load-balancing to performance ratio. ---------------- How do I get better load-balancing in an older version of NiFi? The RPG will send data based on what is currently in the incoming queue per transaction. By limiting the size of that queue, you can control the max number of FlowFiles that will transfer per transaction. You can set the size of object back pressure thresholds on those incoming queues to limit the number of FlowFiles queued at any given time. This will cause FlowFiles to queue on the next upstream connection in the dataflow. If a source processor is feeding the RPG directly, try putting an updateAttribute processor between that source processor and the RPG so you have two connections. As each RPG execution runs and transfers what is on the queue, the queue will be refilled for the next transaction. Apache NiFi 1.13+ update: In newer releases of NiFi, the ability to redistribute FlowFiles within the cluster was made much more efficient and easier through the new load-balanced connection feature. This new feature (stable in Apache NIFi 1.13+ versions) is a simple configuration change that can be done on a connection. It supports numerous strategies for redistribution of FlowFiles, but for load-balanced distribution, it offers a true round-robin capability you can't get from an RPG.

Online	Offline
Last Visited	‎11-19-2025 04:12 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎11-19-2025 04:12 AM
Posts	3,391
Kudos received	1614

Cloudera Community

Re: How to achieve inheritence within Parameter Co...

Re: Cannot access the NiFi Registry from NiFi and ...

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: Generateflow file not genarating flow files

Re: Does NiFi preserves order of transactions in a...

Re: NiFi message when emptying queues: "Waiting fo...

Re: NiFi origin processor name or id as attribute

Re: Generateflow file not genarating flow files

Re: NiFi message when emptying queues: "Waiting fo...

Re: Generateflow file not genarating flow files

Re: Can we specify multiple directories for Nifi l...

Re: Can we specify multiple directories for Nifi l...

How to achieve better load-balancing using NiFi's ...