Member since
07-30-2019
3406
Posts
1621
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 95 | 12-17-2025 05:55 AM | |
| 156 | 12-15-2025 01:29 PM | |
| 104 | 12-15-2025 06:50 AM | |
| 226 | 12-05-2025 08:25 AM | |
| 381 | 12-03-2025 10:21 AM |
05-07-2018
04:35 PM
@John T NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added. - Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs. - How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node. - Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time. - There are numerous way to handle load-balancing. It really depends on your dataflow design choices on how you intend to get data in to your NiFi. Keep in mind that each Nifi nodes in a cluster runs its own copy of the dataflows you build, has their own set of repositories, and thus works on their owns sets of FlowFiles. - While using NiFi's listener type processors would benefit from an external load-balancer to direct that incoming data across all nodes, processors like ConsumeKafka can run on all nodes consuming from same topic (assuming balanced number of Kafka partitions) - Other protocols like SFTP are not cluster friendly. So in dataflows like that you can only have something like ListenSFTP processor running on only one node at any given time. To achieve load-balancing there, a flow typically looks like: ListenSFTP (configured to run primary node only) ---> Remote Process Group (used to re-distribute/load-balance 0 byte FlowFiles to rest of nodes) --> input port --> FetchSFTP (Pulls content for each FlowFile). - One thing you do not want to do in most cases is load-balance the NiFi UI. You can do this but need to make sure you use sticky sessions in your load-balancer here. The tokens issued for user authentication (ldap or kerberos) are only good for node that issued them to user so subsequent requests must go to same node. - Hope this gives you some direction. - Thanks, Matt
... View more
05-07-2018
04:22 PM
@Raffaele S NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added. - Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs. - How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node. - Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time. - Thanks, Matt
... View more
05-07-2018
04:02 PM
@Mahmoud Shash Still seeing issue now that you restarted with "WriteAheadProvenance" repo? Are you seeing any log output in the nifi-app.log from your invokeHTTP processor? How many active threads do you see on the status bar above the canvas in the NiFi UI: if it is sitting at or close to 20, what does your CPU load look like on your host? If CPU utilization is low, try pushing your Max Timer Driven thread count settings higher. Then check if that change resulted in this thread usage number above for canvas jumped higher.
... View more
05-07-2018
03:49 PM
@srinivas p Are we talking about the same 8 node cluster here? (Am i assuming wrong that you work with Adda?) - Try searching your nifi-app.logs for "HTTP requests" or ""Request Counts Per URI" - Depending on number of Remote Process groups used to redistribute flowfiles and the now increased size of your cluster (4 to 8 nodes), you may be envcountering too many outstanding http requests which then causes these timeouts. https://issues.apache.org/jira/browse/NIFI-4153 <-- HDF 3.0.1+ https://issues.apache.org/jira/browse/NIFI-4598 <-- HDF 3.1.0+ - You can fix this issue by upgrading to HDF 3.0.1 and add a new property to your nifi.properties file (new property is part of fix 4153): nifi.cluster.node.max.concurrent.requests=400 (default is 100) - Other things you can try now without upgrading: 1. Make sure all Remote Process Groups (RPGs) are using the "RAW" transport protocol instead of default "HTTP" transport protocol. This will reduce the number of HTTP connections being made to transfer FlowFiles by RPGs since FlowFiles would be transferred over its own dedicated tcp socket instead. 2. Increase "nifi.cluster.node.protocol.threads=50" from default 10. This will help with the larger umber of nodes in your cluster now. 3. Increase "nifi.web.jetty.threads=400" from default 200. 4. Any processors that are invalid or stopped on your canvas should be "disabled". This will improve the responsiveness of your UI since NiFi will not validate disabled processors. NiFi is always validating any stooped processors to determine if they are stopped/valid or stopped/invalid. This will occur anytime user login and every time they navigate around the canvas/flows. - Thanks, Matt
... View more
05-07-2018
01:42 PM
@Mahmoud Shash Having a lot of "WAITING" threads in a nifi dump is very normal and does not indicate an issue. I only see a couple "BLOCKED" threads and do not see a single "invokeHTTP" thread. When there is no obvious issue in thread dump, it becomes necessary to take several thread dumps (5 minutes or more between each one) and note what threads persist across all of them. Again some may be expected but others may not. For example web and socket listeners would be expected all the time waiting. - Did you run through my suggestion? Was the queued data actually on the node from which you provide thread dump? How many nodes in your cluster? - I do see WAITING threads on Provenance which can contribute to a slow down of your entire flow. There is a much faster provenance implementation available in your HDF release versus the one you are using now based on thread dump: - Currently you are using default "org.apache.nifi.provenance.PersistentProvenanceRepository" - Switch to using "org.apache.nifi.provenance.WriteAheadProvenanceRepository" - You can switch from persistent to writeAhead without needing to delete your existing provenance repository. NiFi will handle the transition. However, you will not be able to switch back without deleting the provenance repository. - You will need to restart NiFi after making this configuration change. - Another thing to consider is the number configured for your "Max Timer thread count" (found in global menu under "controller settings). This setting controls the maximum number of cpu threads that can be used to service all the processor concurrent task requests. - Thanks, Matt
... View more
05-07-2018
12:50 PM
@Mahmoud Shash Please do not forget to login and click "accept" at the bottom of whichever answer provided was able to address your original question. This helps user of this forum focus in on the working solutions. Thank you, Matt
... View more
05-07-2018
12:17 PM
1 Kudo
@Mahmoud Shash - I noticed that you have this processor configured to run "primary node" only. You should never run processors that operate on incoming FLowfiles from a source connection with "primary node" only. In a NiFi cluster, the zookeeper elected primary node can change at anytime. So it becomes possible that the data queued in that connection feeding your InvokeHTTP processor is sitting on what was previously a primary node. Since your invokeHTTP processor is configured for "Primary node" only it will not be running anymore on the old primary node to work on those queued FlowFiles. - Suggestion: 1. From Global menu in upper right corner of NiFi UI, click on "cluster". From the UI that appears take note of who the current Primary node is in your cluster. exit out of that UI. 2. From Global menu in upper right corner of NiFi UI, click on "Summary". With "CONNECTIONS" tab selected form across top, locate the connection with the queued data (You can click on "Queue" column name to sort the rows). To the far right of that row you will see 3 stacked boxes icon. clicking this will open a new UI where you can see exactly which node(s) have these queued FlowFiles. If it is not the current primary node, then the invokeHTTP processor is not going to be running there to consume them. - Only processors that ingest new Flowfiles that do not take an incoming connection form another processor and that are using non cluster friendly protocols should be scheduled for "primary node" only. All other processors should be scheduled for all nodes. - If queued data is actually on the primary node, you will want to get a thread dump to determine what the invokeHTTP processor thread that is active is waiting on. ( ./nifi.sh dump > <name of your dump file> ). In this case it is likely waiting on some response from your http end-point. - Thank you, Matt
... View more
05-04-2018
06:50 PM
@Veerendra Nath Jasthi The DN there is coming from the keystore being used by your NiFi nodes. I have no idea why the certs created for your servers all have nifiadmin in them.... ... But just like your user DN, the node identities must match exactly with what is in those server certs in the keystore.. - <property name="Node Identity 1">CN=nifiadmin, OU=NIFIrsdevhdf1.medassurant.local, OU=NIFI</property>
<property name="Node Identity 2">CN=nifiadmin, OU=NIFIrsdevhdf2.medassurant.local, OU=NIFI/</property>
<property name="Node Identity 3">CN=nifiadmin, OU=NIFIrsdevhdf3.medassurant.local, OU=NIFI</property> - so you will need to edit your node identities so they match the above and once again stop NiFi, remove your users.xml and authorizations.xml files, and then start NiFi again via Ambari. - Thank you, Matt
... View more
05-04-2018
04:16 PM
@Veerendra Nath Jasthi You are so very very close. Remember how i commented above that the DN string must match exactly? Your admin DN is: CN=nifiadmin, OU=NIFI But you entered the following as your Initial Admin Identity: CN=nifiadmin,OU=NIFI Note that you are missing the space between "CN=nifiadmin," and "OU=NIFI" - Thanks, Matt
... View more
05-04-2018
03:57 PM
@Prakhar Agrawal @Felix Albani is correct. There is no way to automatically have a node delete his flow.xml.gz in favor of the clusters flow. If we allowed that it could lead to unexpected data loss. Lets assume a node was taken out of the cluster do perform some side work and the user tries to rejoin it to cluster, if it just took the clusters flow, any data queued in a connection that doe snot exist in clusters flow would be lost. It would be impossible for Nifi to know if the joining of this node to this cluster was a mistake or intended, so NiFi simply informs you there is a mismatch and expects you to resolve the issue. - Also noticed you mentioned "NCM" (NiFi Cluster Manager). NIFi moved away from having a NCM staring with Apache NIFi 1.x version. Newer version have a zero master cluster where any connected node can be elected as the cluster's coordinator. - Thanks, Matt
... View more