About MattWho

MattWho · ‎05-09-2018

@Prakhar Agrawal Shu's answer deals purely with the responsiveness of the NiFi UI and has nothing to do with the performance of running dataflows. Disabling a processor does not mean it is removed from the canvas. A processor component in a disabled state is not validated by the NIFi controller. You can enable and run processors anytime you like and then stop and disable them when done. At a minimum you should disable all your invalid processors since they are not capable of running anyway. - So the real question is what you mean by "responding very slow" specifically the canvas or do you notice flows processing Flowfiles slower. How did you come to that conclusion? - This may come down to tuning your dataflow(s) themselves. What is "Max Timer Driven Thread Count" set to in "controller settings" found in global menu. Default is only 10. Which means all processor must share only 10 cpu threads. Processors that appear to be processing slowly, how many concurrent tasks have they been configured to use, run schedules, etc...? - Thanks, Matt

MattWho · ‎05-09-2018

@Hans Feldmann - You are hitting a known bug in MiNiFi 0.4.0: https://issues.apache.org/jira/browse/MINIFI-435 - Fix is part of MiNiFi 0.5.0 or you can roll back to MiNiFi 0.3.0 which i understand works as well. - Thank you, Matt

MattWho · ‎05-07-2018

@Davide Vergari @srinivas p - Let me explain what is going on here so you can understand why the configuration change made helps in some cases and not others. - When you add a component to the canvas of a NiFi cluster the following steps are performed. 1. That request is forwarded to the current elected cluster coordinator on behalf of the user who added the component. 2. The cluster coordinator then replicates that request to all the nodes connected in the cluster. (the nifi.cluster.node.protocol.threads=10 setting dictates how many concurrent request can be made, so larger clusters will need to have this value increased) 3. Each node must then make the change and respond back to the cluster coordinator. - While this process is consistent for every such replicated request, not all changes are equal. The action of selecting a bunch of components on the canvas copying and pasting them or instantiating a large template to the canvas also constitutes a single replication requests instead of many requests. These component bundles are referred to as snippets in NiFi. This action is not asynchronous. this means that each node must add every component (processors, connections, controller services, etc..) from this snippet before it responds to the cluster coordinator. Depending on size of the snippet and load on the server, this request may very likely exceed to configured timeout. This results in those nodes that timed out being disconnected from cluster. - Increasing the timeouts allows more time for these snippets to be instantiated and response to be received. Because there is no way to know how large these snippets are, the timeout setting that works for one user may not work for others. - Two things to keep in mind here: 1. How many node requests can be made concurrently (as shown above default is 10). Using a 16 node NiFi cluster as an example, 10 nodes must respond before the other 6 even get the request. so increase this value. 2. The nifi.cluster.node.connection.timeout and nifi.cluster.node.read.timeout can be set to much higher values. Even settings these timeouts to 2 - 5 minutes does not mean every request will take that long. It simply means that you will allow that much time before the cluster coordinator makes the decision to disconnect the node due to timeout. - There is work on the roadmap to redesign these types of replication requests in to asynchronous type requests eventually. Once that happens user will not need such high timeouts configured. - Thank you, Matt

MattWho · ‎05-07-2018

@John T The ListenHTTP processor works just like any one of our other Listen based processors. This processor should be configured to run on every node. That way every node can receive data. The listen based processors are configured to Listen on a specific port. So the endpoint for a listenHTTP would be something like: - http(s)://<hostname>:<listenerport>/<base path> - You could have an external load-balancer that is configured to receive all your inbound traffic and load-balance it all the node sin the NiFi cluster. - You could also install NiFi or MiNiFi at each of your data sources and use NiFi's Site-To-Site (S2S) protocol to load-balance the delivery of FlowFiles to this target cluster. - Listen based processors are not ideal for the Listen (primary node) --> RPG (S2S) --> input port (all nodes) --> rest of dataflow model. Tat is because the Listen based processor receive the entire payload. This means your primary node has to handle a lot of writes to content repo (all data) before then sending that data across the network to other nodes (redistribution). can be an expensive waste of resources. That is why load-balancing with this type of processor is better done out front of NiFi. - Thanks, Matt

MattWho · ‎05-07-2018

@srinivas p *** Forum tip: Avoid responding to existing answers with a new answer. Instead use comments to correspond within a single answer. - That being said, your environment is very different from environment in this original question. Far fewer nodes. Are you running same version of HDF/NiFi? - - I actually recommend starting a new question with your environment specific details. You'll get my traction answer wise that way. - Thanks, Matt

MattWho · ‎05-07-2018

@John T NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added. - Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs. - How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node. - Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time. - There are numerous way to handle load-balancing. It really depends on your dataflow design choices on how you intend to get data in to your NiFi. Keep in mind that each Nifi nodes in a cluster runs its own copy of the dataflows you build, has their own set of repositories, and thus works on their owns sets of FlowFiles. - While using NiFi's listener type processors would benefit from an external load-balancer to direct that incoming data across all nodes, processors like ConsumeKafka can run on all nodes consuming from same topic (assuming balanced number of Kafka partitions) - Other protocols like SFTP are not cluster friendly. So in dataflows like that you can only have something like ListenSFTP processor running on only one node at any given time. To achieve load-balancing there, a flow typically looks like: ListenSFTP (configured to run primary node only) ---> Remote Process Group (used to re-distribute/load-balance 0 byte FlowFiles to rest of nodes) --> input port --> FetchSFTP (Pulls content for each FlowFile). - One thing you do not want to do in most cases is load-balance the NiFi UI. You can do this but need to make sure you use sticky sessions in your load-balancer here. The tokens issued for user authentication (ldap or kerberos) are only good for node that issued them to user so subsequent requests must go to same node. - Hope this gives you some direction. - Thanks, Matt

MattWho · ‎05-07-2018

@Raffaele S NiFi is a very difficult things to make a one size fits all sizing recommendation for. NiFi does not typically scale linearly. This is why you see the hardware specs exponentially increase as throughput increases. This is based on the fact that typical NiFi workflows all grow exponentially in size and complexity as the volume of throughput increases in most cases. More and more workflows are added. - Different NiFi processors in different workflows contribute to different server resource usage. That resource usage varies based processor configuration and FlowFile volume. So even two workflows using same processors may have different sizing needs. - How well a NiFi is going to perform has a lot to do with the workflow the user has build. After all it is this user designed work flow that is going to be using the majority of the resources on each node. - Best answer to be honest is to built your workflows and stress test them. This kind of a modeling and simulation setup. Learn the boundaries your workflows put on your hardware. At what data volume point does CPU utilization, network bandwidth, memory load, disk IO become my bottleneck for my specific workflow(s). Tweaking your workflows and component configurations. Then scale out by adding more nodes allowing some headroom considering it is very unlikely ever node will be processing the exact same number of NiFi FlowFiles all the time. - Thanks, Matt

MattWho · ‎05-07-2018

@Mahmoud Shash Still seeing issue now that you restarted with "WriteAheadProvenance" repo? Are you seeing any log output in the nifi-app.log from your invokeHTTP processor? How many active threads do you see on the status bar above the canvas in the NiFi UI: if it is sitting at or close to 20, what does your CPU load look like on your host? If CPU utilization is low, try pushing your Max Timer Driven thread count settings higher. Then check if that change resulted in this thread usage number above for canvas jumped higher.

MattWho · ‎05-07-2018

@srinivas p Are we talking about the same 8 node cluster here? (Am i assuming wrong that you work with Adda?) - Try searching your nifi-app.logs for "HTTP requests" or ""Request Counts Per URI" - Depending on number of Remote Process groups used to redistribute flowfiles and the now increased size of your cluster (4 to 8 nodes), you may be envcountering too many outstanding http requests which then causes these timeouts. https://issues.apache.org/jira/browse/NIFI-4153 <-- HDF 3.0.1+ https://issues.apache.org/jira/browse/NIFI-4598 <-- HDF 3.1.0+ - You can fix this issue by upgrading to HDF 3.0.1 and add a new property to your nifi.properties file (new property is part of fix 4153): nifi.cluster.node.max.concurrent.requests=400 (default is 100) - Other things you can try now without upgrading: 1. Make sure all Remote Process Groups (RPGs) are using the "RAW" transport protocol instead of default "HTTP" transport protocol. This will reduce the number of HTTP connections being made to transfer FlowFiles by RPGs since FlowFiles would be transferred over its own dedicated tcp socket instead. 2. Increase "nifi.cluster.node.protocol.threads=50" from default 10. This will help with the larger umber of nodes in your cluster now. 3. Increase "nifi.web.jetty.threads=400" from default 200. 4. Any processors that are invalid or stopped on your canvas should be "disabled". This will improve the responsiveness of your UI since NiFi will not validate disabled processors. NiFi is always validating any stooped processors to determine if they are stopped/valid or stopped/invalid. This will occur anytime user login and every time they navigate around the canvas/flows. - Thanks, Matt

MattWho · ‎05-07-2018

@Mahmoud Shash Having a lot of "WAITING" threads in a nifi dump is very normal and does not indicate an issue. I only see a couple "BLOCKED" threads and do not see a single "invokeHTTP" thread. When there is no obvious issue in thread dump, it becomes necessary to take several thread dumps (5 minutes or more between each one) and note what threads persist across all of them. Again some may be expected but others may not. For example web and socket listeners would be expected all the time waiting. - Did you run through my suggestion? Was the queued data actually on the node from which you provide thread dump? How many nodes in your cluster? - I do see WAITING threads on Provenance which can contribute to a slow down of your entire flow. There is a much faster provenance implementation available in your HDF release versus the one you are using now based on thread dump: - Currently you are using default "org.apache.nifi.provenance.PersistentProvenanceRepository" - Switch to using "org.apache.nifi.provenance.WriteAheadProvenanceRepository" - You can switch from persistent to writeAhead without needing to delete your existing provenance repository. NiFi will handle the transition. However, you will not be able to switch back without deleting the provenance repository. - You will need to restart NiFi after making this configuration change. - Another thing to consider is the number configured for your "Max Timer thread count" (found in global menu under "controller settings). This setting controls the maximum number of cpu threads that can be used to service all the processor concurrent task requests. - Thanks, Matt

Online	Online
Last Visited	‎02-02-2026 11:38 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎02-02-2026 11:38 AM
Posts	3,431
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: nifi 2.6 registry security scan results

Re: Nifi is responding very slow

Re: Upgrading MiniFi from 0.1.0 to 0.4.0

Re: nifi socket timeout importing template

Re: 40 Gbps NiFi Cluster

Re: Read Time out issue with NiFi cluster NiFi ver...

Re: 40 Gbps NiFi Cluster

Re: Correct NiFi sizing

Re: NIFI Flow Files stopped , and the processors ...

Re: Read Time out issue with NiFi cluster NiFi ver...

Re: NIFI Flow Files stopped , and the processors ...