Support Questions
Find answers, ask questions, and share your expertise

Scaling of Apache Nifi with Large Number of Templates

New Contributor
How does it scale? If I have 100 k templates, how will Flow XML load in memory? How does it manage individual processors in memory? 
1 ACCEPTED SOLUTION

Accepted Solutions

Master Guru

@Rupesh_Raghani 

 

Since templates reside in NiFi heap, they should only be uploaded to the NiFi for the purpose of instantiating that template to the canvas.  Once instantiate on to the canvas, the template should be deleted from the NiFi, so it is no longer holding that memory space.

In addition to uploaded templates consuming heap memory space, so does everything build on the canvas (including controller services, and reporting tasks.).   Additionally the metrics for each component also reside within heap memory space.  Additionally, all FlowFiles queued (except large queues resulting in swap files) will also reside in the NiFi JVM heap memory space.  How much heap each FlowFile consumes is driven by the number and size of the FlowFile attributes on each FlowFile (FlowFile content does not reside in heap memory except when a processor needs to do so to perform its task and not all processors need to touch the content at all and other may also read it without needing to hold it in heap if it is streaming it somewhere else.). The impact on heap various based on what components are being used and how many. If your flow grow extremely large, it may be a case of breaking those flows to be managed by Multiple NiFi clusters.

NiFi flow templates will become a deprecated capability in favor of NiFi-Registry.  You can version control your flows in to NiFi-Registry.  All NiFi's connected to this NiFi-Registry can then load Flows from NiFi-Registry to the canvas (one or more times).

I am not sure what you are looking for with regards to "How does it manage individual processors in memory?"  All processors residing within the canvas and within templates will reside in the JVM heap memory space.

If you find this helps with yoru query, please take a moment to login and click "ACCEPT" on this solution.
Thank you,

Matt

View solution in original post

3 REPLIES 3

Master Guru

@Rupesh_Raghani 

 

Since templates reside in NiFi heap, they should only be uploaded to the NiFi for the purpose of instantiating that template to the canvas.  Once instantiate on to the canvas, the template should be deleted from the NiFi, so it is no longer holding that memory space.

In addition to uploaded templates consuming heap memory space, so does everything build on the canvas (including controller services, and reporting tasks.).   Additionally the metrics for each component also reside within heap memory space.  Additionally, all FlowFiles queued (except large queues resulting in swap files) will also reside in the NiFi JVM heap memory space.  How much heap each FlowFile consumes is driven by the number and size of the FlowFile attributes on each FlowFile (FlowFile content does not reside in heap memory except when a processor needs to do so to perform its task and not all processors need to touch the content at all and other may also read it without needing to hold it in heap if it is streaming it somewhere else.). The impact on heap various based on what components are being used and how many. If your flow grow extremely large, it may be a case of breaking those flows to be managed by Multiple NiFi clusters.

NiFi flow templates will become a deprecated capability in favor of NiFi-Registry.  You can version control your flows in to NiFi-Registry.  All NiFi's connected to this NiFi-Registry can then load Flows from NiFi-Registry to the canvas (one or more times).

I am not sure what you are looking for with regards to "How does it manage individual processors in memory?"  All processors residing within the canvas and within templates will reside in the JVM heap memory space.

If you find this helps with yoru query, please take a moment to login and click "ACCEPT" on this solution.
Thank you,

Matt

View solution in original post

New Contributor

Hi Matt,

 

Thanks for your detail explanation  on templates, I am looking how templates will be loaded in clustered deployment. So if I have templates on every node in cluster then will the nifi load templates on each node as per templates available on their node or a parent node will load all templates from other cluster ?

Master Guru

@Rupesh_Raghani 

I just want to make sure that when we are both talking about NiFi "Templates" we are talking about the same thing.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates

When you upload a NiFi template (xml file) to NiFi via the UI (Does not matter which node in a NiFi cluster you are accessing), that template will get uploaded and replicated to all nodes in the cluster.  So all nodes will have that template in each nodes JVM heap and written to the flow.xml.gz on disk.

This statement is not clear to me:
"So if I have templates on every node in cluster then will the nifi load templates on each node as per templates available on their node or a parent node will load all templates from other cluster".

What do you mean by "if I have templates on every node"?  In a NiFi cluster, every node must have the same flow.xml.gz.  If the flow loaded in to heap memory does not match between nodes, the nodes no matching the elected cluster flow will be disconnected from the cluster.  Each node, while it has its own local copy of the flow, runs the exact same flow.

Hope this helps,

Matt