Member since
07-30-2019
3427
Posts
1632
Kudos Received
1011
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 73 | 01-27-2026 12:46 PM | |
| 482 | 01-13-2026 11:14 AM | |
| 988 | 01-09-2026 06:58 AM | |
| 900 | 12-17-2025 05:55 AM | |
| 961 | 12-15-2025 01:29 PM |
06-30-2017
06:21 PM
@Bharadwaj Bhimavarapu Along the menu bar at the top of the NiFi UI is there is a field that shows the current number of active threads in your NiFi. Is this a standalone NIFi install or a multi-node NIFi cluster? Under "Controller Settings" found within the hamburger menu in the upper right corner of the UI, What do you have configured for "Maximum Timer Driven Thread Count" and " Maximum Event Driven Thread Count"? I am wondering if you have some other processors in your NIFi holding all your available threads so this processor cannot get one. Thanks, Matt
... View more
06-30-2017
06:09 PM
@Greg Keys
What is your downstream processor component it is waiting for?
... View more
06-30-2017
03:43 PM
@Bharadwaj Bhimavarapu Can you also share a screenshot of your GenerateFlowFile "scheduling" and "settings" tabs as well.
Is this the only flow on your graph?
Anything odd in the nifi-app.log when you start the processor?
... View more
06-29-2017
06:59 PM
NiFi allows you to specify multiple of the following:
- Content repositories directories - Provenance repositories directories - NiFi lib directories - Variable registry files Having multiple of any of these does not mean cloning of any data I going on. Thanks, Matt
... View more
06-29-2017
06:54 PM
1 Kudo
@Kuldeep Kulkarni No you can not specify more then one local state directory. There is not much in the way of local state that is store by NiFi, so i am not sure the use case for needing more then one. Thanks, Matt
... View more
06-26-2017
04:43 PM
7 Kudos
The NiFi S2S protocol is used by NiFi's Remote Process Group (RPG) components to distribute FlowFiles from one NiFi instance to another. When the target NiFi is a NiFi cluster, load-balancing of the FlowFie delivery is done across all nodes in the target NiFi cluster.
The default way this works (and the only way it works in versions of NiFi previous to Apache NiFi 1.2.0 or HDF 3.0) is as follows:
The RPG regularly communicates with the target NiFi cluster to get load status information about each node in the cluster. This information includes the number of currently connected nodes in the target cluster, each node's hostname, port information, and the number of total queued FlowFiles on each target NiFi node.
The Source NiFi uses this information to determine a data distribution strategy for its source FlowFiles it has queued. - - - Let's assume a 4 node target NiFi cluster all reporting a zero queue count. - Each node will then be scheduled to receive 25% of the data. This means a distribution pattern of node1, node2, node3, and then, node4. - Now let's assume the same 4 node target cluster; however, node 1 and node 2 report having a queue of FlowFiles that results in the following: - Node 1 and Node 2 would get 16.67% of the data while node 3 and node 4 get 33.33% of the data. This results in a distribution pattern of node1, node2, node3, node4, node3, and then node 4. So Nodes 3 and 4 get twice the opportunity to receive data over nodes 1 and 2.
Once the distribution pattern is determined, the RPG connects to the first node and starts transferring data from the incoming queue to that node for 500 milliseconds or until the queue is empty. The next run of RPG will start sending to the next node in pattern and so on.
As you can see by this default distribution model, the data may not always be distributed as desired. The reason this transfer was implemented this way was for performance reasons. However, when working with very small FlowFiles, where FlowFiles come in a wide range of sizes from small to large, when a better network connection exists between one target node than another, or data comes in bursts instead of continuous flow, the load-balancing will be less than ideal.
With the introduction of HDF 3.0 (Apache NiFi 1.2.0). additional configuration options were added to the RPG to control the number of FlowFiles (count), amount of data (size), and/or length of transaction time (duration) per RPG port connection. This gives users the ability to fine-tune their RPG connection to achieve better load-balancing results when dealing with lighter volume dataflows, network performance differences between nodes, etc.
These new configuration options can be set as follows:
Each input and output port configuration will need to be set individually.
Of course, setting count to a value of 1 sounds like a good way to achieve really good load-balancing, but it will cost you in performance since only one FlowFile will be sent in each transaction. So, there will be extra overhead introduced due to the volume of new connections being opened and closed. So you may find yourself playing around with these settings to achieve your desired load-balancing to performance ratio.
----------------
How do I get better load-balancing in an older version of NiFi?
The RPG will send data based on what is currently in the incoming queue per transaction. By limiting the size of that queue, you can control the max number of FlowFiles that will transfer per transaction. You can set the size of object back pressure thresholds on those incoming queues to limit the number of FlowFiles queued at any given time. This will cause FlowFiles to queue on the next upstream connection in the dataflow. If a source processor is feeding the RPG directly, try putting an updateAttribute processor between that source processor and the RPG so you have two connections. As each RPG execution runs and transfers what is on the queue, the queue will be refilled for the next transaction. Apache NiFi 1.13+ update: In newer releases of NiFi, the ability to redistribute FlowFiles within the cluster was made much more efficient and easier through the new load-balanced connection feature. This new feature (stable in Apache NIFi 1.13+ versions) is a simple configuration change that can be done on a connection. It supports numerous strategies for redistribution of FlowFiles, but for load-balanced distribution, it offers a true round-robin capability you can't get from an RPG.
... View more
Labels:
06-15-2017
01:07 PM
@Prakash Ravi Nodes in a NiFi cluster have no idea about the existence of other nodes in the cluster. Nodes simply send heath and status heartbeat messages to the currently elected cluster coordinator. As such, each node runs its own copy of the flow.xml.gz file and works on its own set of FlowFiles. So if you have 9 NiFi nodes, each node will be running its own copy of the consumeKafka processor. With 1 concurrent task set on the processor, each node will establish one consumer connection to the Kafka topic. So you would have 9 consumers for 10 partitions. So in order to consume from all partitions you will need to configure 2 concurrent tasks. This will give you 18 consumers for 10 partitions. Kafka will assign a partition connections within this pool of 18 consumers. Ideally you would see 1 consumer on 8 of your nodes and 2 on one. The data to your niFi cluster will not be evenly balanced because of the in-balance in number of consumers versus partitions. As far as your Kafka Broker rebalance goes.... Kafka will trigger a rebalance if a consumer disconnects and another consumer connects. Things that can cause a consumer to disconnect include: 1. Shutting down one or more of your NiFi nodes. 2. Connection timeout between a consumer and a Kafka broker. - Triggered by network issues between a NiFi node and Kafka broker - Triggered by scheduling Consume Kafka run schedule for longer then configured timeout. for example a 60 second run schedule and 30 second timeout. - Triggered by backpressure being applied on the connection leading off the consumeKafka causing ConsumeKafka to not run until backpressure is gone. *** This trigger was fixed in NiFi 1.2, but i don't knwo what version you are running. I you feel I have addressed your original question, please mark this answer as accepted ( ) to close out this thread. Thank you, Matt
... View more
06-15-2017
12:17 PM
@Johny Travolta I don't understand how not having a shared token necessarily means you do not have a cluster. You will get better traction form the community if you move this to a new question. I am not a NiFi developer myself, so I could not comment on the complexity of implementing a shared LDAP token response across all nodes in a NiFi cluster. But I am sure if you open a new question around this topic, you will get a response from someone who can answer for you. Thanks, Matt
... View more
06-14-2017
01:59 PM
1 Kudo
@Thierry Vernhet With number 3, I am assuming that every file has a unique filename from which to determine if the same filename has ever been listed more then once. If that is not the case, then you would need to use detectDuplicate after fetching the actual data (less desirable since you will have wasted the resources to potential fetch the same files twice before deleting the duplicate. Let assume every file has a unique filename. If so the detect duplicate flow would look like this: with the DetectDuplicate configured as follows: You will also need to add two controller services to your NiFi: - DistributedMapCacheServer - DistributedMapCacheClientService The value associated to the "filename" attribute on the FlowFile is checked against entries in the DistributedMapCacheServer. If filename does not exist, it is added. If it exists already then FlowFile is routed to duplicate relationship. In scenario 2 where filenames may be reused we need to detect if the content after fetch is a duplicate or not. IN this case the flow may look like this: After fetching the content of a FlowFile, the "HashContent" processor is used to create a hash of the content and write it to a FlowFile attribute (default is hash.value). The detectDuplicate processor then configured to look for FlowFile with the same hash.value to determine if they are duplicates. FlowFiles where the content hash already exist in the distributedMapCacheServer, those FlowFile are routed to duplicate where you can delete them if you like. If you found this answer addressed your original question, please mark it as accepted by clicking under the answer. Thanks, Matt
... View more
06-14-2017
12:39 PM
1 Kudo
@Narasimma varman Try configuring the "Database Driver Jar Url" property with the absolute path to you "postgresql-42.1.1.jre7.jar" file. for example: c:/post/postgresql-42.1.1.jre7.jar /post/postgresql-42.1.1.jre7.jar
Also check out the nifi-app.log for a full stack trace that may follow the above ERROR that may give more detail on why it can't load the file. Thanks, Matt
... View more