Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Master Guru

Short Description:

This article covers how to improve the performance of the NiFi UI.

Article:

Over time it has been seen that the users of NiFi have been building very large dataflows consisting of many thousands of components (processor, reporting tasks, controller services, etc). While NiFi in no way limits to any degree the number of components that can be added to the NiFi canvas, the more components a user adds, the less responsive the UI becomes. This processor explosion not only affects the responsiveness of the UI, but can also lead to unexpected node disconnections.

--- What are the various states a component can have?

NiFi components have multiple states that consist of stopped, started, enabled, and disabled. Beyond these states exists one of two statuses:

Valid: Component configuration was successfully validated. This means that all required properties have been configured and in the case of processors all required connections have been accounted for (connected to another component or terminated) and any referenced controller services have been enabled.

Invalid: Component configuration is not valid. This means that one or more required properties have not been configured and/or in the case of processors one or more connections have not been accounted for (connected to another component or terminated) and/or a referenced controller services have not been enabled.

--- Why does a processors state affect UI performance?

All processor components when added to the canvas are added in the "stopped" state. A user can then either start or disable that component manually.

All Controller Services and Reporting tasks added by a user are by default disabled. The user can then enable these components as needed.

NiFi regularly must validate these components to see if they are valid or invalid. While the validation of a few hundred to a thousand components adds up to very little time, the same does not hold true for NiFi instances consisting of thousands upon thousands of components.

User may have noticed a 69390-screen-shot-2018-04-09-at-93910-am.png swirling on the right hand side of the NiFi status bar that seems to never go away. In a NiFi cluster, NiFi must retrieve the flow status from every node. It is possible for a component to be valid on one node but not another (for example, processor depends on local file that does not exist on all nodes). If this validation takes too long a node may be disconnected because the request took to long. Not to mention the UI does not update until these validations have completed.

--- What has NiFi done to make improvements here?

The bad news:

Prior to NiFi 1.1.0 there is nothing that can be done to improve performance here other then reducing the number of components you are using. This is because in versions of NiFi prior to components were validated in all four states.

The good news:

In NiFi 1.1.0 a change was made so that this validation only occurs on components that are in the "stopped" state and controller services or reporting tasks that are disabled. It is safe to assume that if a processor is running, it must be valid. It is also safe to assume that a Controller Service or a Reporting Task must be valid if it is enabled. Now that these "started" processors and "enabled" controller services or reporting tasks are no longer being validated, the UI performance will be much better.

https://issues.apache.org/jira/browse/NIFI-2996

--- What is the important to understand here?

It has also been observed that users add lots of components to the UI that are never started or are only started for short periods of time. If the number of "Stopped" processors is very high, validation is still going to take a considerable amount of time even in NiFi 1.10 or newer versions.

A quick look at the NiFi status bar above your canvas will show how many stopped components you have on your canvas:

69392-screen-shot-2018-04-09-at-100636-am.png

To make sure the UI performance remains solid, it is important that users disable processors that are not in use on the canvas. You can use the "NiFi Summary" UI to to find stopped/invalid processors and disable them. Select the "PROCESSORS" tab and sort on the "Run Status" column. Clicking on the 69393-screen-shot-2018-04-09-at-100950-am.png on the right hand side of row will take you directly to that processor.

Once a component is selected on the canvas it can be disabled or enabled via the "Operate" panel or by right clicking on processor and selecting Disable or Enable for displayed context menu.

69394-screen-shot-2018-04-09-at-101306-am.png

1,280 Views
Comments
New Contributor

Hi Matt,

Thanks a lot for a wonderful article, i was looking for this since a long time.

I have a question, How can we disable or enable multiple processors or a complete process group all at once? As i have many process groups on my canvas which contains too many processors and enabling or disabling them one at a time would be very cumbersome.

Please suggest!!

Thanks,

Sri

Master Guru

@sri chaturvedi

Thank you for your feedback.

Unfortunately, the "Disable" and "Enable" buttons are not available when multiple components are selected.

I filled a JIra for such an improvement (https://issues.apache.org/jira/browse/NIFI-5066 )

For now, when dealing with a flow with such a large number of stopped components, it may be easier to simply manually edit the NiFi flow.xml.gz file.

What you want to look for are all entries containing the following string:

<scheduledState>STOPPED</scheduledState>

and replace that with:

<scheduledState>DISABLED</scheduledState>

My suggestion would be to make a copy of the flow.xml.gz file. Edit the copy as described above. Stop your NiFi instance/cluster. Then switch out the original flow.xml.gz with the new modified copy of the flow.xml.gz on all NiFi instances. Make sure file ownership is correct and restart NiFi.

Thank you,

Matt

Expert Contributor

@Matt Clarke, @sri chaturvedi,

Matt, this actually requires server restart (and if there is a cluster - removing or editing the file on the rest of the nodes).

I think it can be done by creating a flow in NIFI :))))

1. Read flow.xml.gz

2. Parse XML to find stopped processors and their IDs (as per above)

3. Use NiFi rest API to change a state to disabled.

New Contributor

Thanks for the solution, but since i am not familiar with rest api, solution by Matt looks easy to me. Will surely try yours one too.

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:44 AM
Updated by:
 
Contributors
Top Kudoed Authors