Member since
07-30-2019
3400
Posts
1621
Kudos Received
1003
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 138 | 12-05-2025 08:25 AM | |
| 274 | 12-03-2025 10:21 AM | |
| 551 | 11-05-2025 11:01 AM | |
| 418 | 11-05-2025 08:01 AM | |
| 798 | 11-04-2025 10:16 AM |
10-20-2017
04:34 PM
2 Kudos
@Bilel Boubakri The same concept applies for sending from NiFi to MiNiFi. The RPG can be used to push Flowfiles (as shown in the above screenshots), but can also be used to pull FlowFiles from a Remote Output port. Thanks, Matt
... View more
10-20-2017
12:10 PM
@Gerd Koenig I edited my response to be more clear. While Ranger is supported, the use of Ranger Groups is not. Thanks, Matt
... View more
10-19-2017
09:58 PM
@dhieru singh FlowFiles generated by ListenUDP are placed on the outbound connection. One of the easiest ways to see the sizes of those FlowFiles is to right click on that connection (while it has queued data) and select "list queue" from the context menu that is displayed. It will open a new UI that will list all FlowFiles queued on that connection along with their details. Matt
... View more
10-19-2017
06:21 PM
@dhieru singh I am not clear on what you mean by "stopping a dataflow resulted in data loss"? NiFi does not delete any data when a dataflow is stopped. Data will remain queued between the stopped components until the dataflow is restarted or a user manual operation is performed to purge the data from those queues. There is no notion of a "lock" in NIFi that can be set on a component or set of components in NiFi. In addition, having a a double confirm every time a user wants to stop a component to make an edit may be more annoying then beneficial. That being said, it might be an interesting idea to add the ability to "lock" the current running state of "process group". Basically putting all components in a process group in to read only mode until the lock is removed. Might be worth you creating an NiFi Apache Jira for such a thing. If you Nifi is secured, you can prevent such issues by taking away users "modify" access to the components. Without Modify access policies, users can only view the components. They will not be able to change active state (start, stop, enable, or disable) or the configuration. But this would also require that you re-add "modify" anytime a change is desired. Thank you, Matt
... View more
10-19-2017
02:19 PM
@dhieru singh I am assuming you are not having any issues with your ListenUDP processor? It is successfully keeping up with your 10,000 messages per second? Is the real problem here how fast the MergeContent processor is merging those queued FlowFiles between ListenUDP and MergeContent? I can tell you that trying to merge 127,000 FlowFiles at a time via the MergeContent processor is going to put a lot of pressure on your NiFi heap. I would not be surprised if you encountered Out-Of-Memory (OOM) errors. That pressure is caused by FlowFile Attributes. FlowFile Attributes for every FlowFile being merged by the MergeContent processor is being held in heap. To reduce that heap pressure, I suggest using two MergeContent processors in series. Have the first merge FlowFiles based on Min num entries of 10000 and max num entries of 15000. Then feed success form that MergeContent to another MergeContent configured to Merge again based on Min and max bin size. The end result is better performance and less pressure on heap. Now you also have the ability to set a higher number of concurrent tasks on your MergeContent processors. This allows this processors to execute numerous simultaneous times (if sufficient work exists). Each concurrent task would have the ability to merge a different "bin" at the same time. The formula here should never exceed number of bins + 1 equal to or greater then the number of concurrent tasks. Foe example.: If mergeContent is configured for 7 bins, there should not be more then 6 concurrent tasks assigned to this processor. Once you make these changes, you will need to keep an eye out for OOM errors still. More concurrent tasks also means more heap usage by the mergeContent processors. You may find that you need to allocate more memory to your NiFi JVM to support your dataflow design. Also make sure you have optimized the number of overall threads your NiFi instance is allowed to use. This is found under "Controller settings" in the hamburger menu. The default is set to only 10 Max Timer Driven Thread count. (Don't worry about the Event Driven Thread count). This means that all components on your canvas must share these 10 threads only. The setting for Max Timer Driven thread count should be set 2 - 4 times the number of cores available on your NiFi server. Once you makes changes to concurrent tasks and max thread count settings, keep an eye on your CPU usage on your server to make sure you have not over-allocated resulting in 100% CPU usage all the time. Now with ListenUDP processor, you could increase the "Max batch Size" so that more data is written to each FlowFile that is output from this processor. Hope this helps. Thank you, Matt
... View more
10-19-2017
01:50 PM
@Bilel Boubakri If I am understanding correctly, you have many servers that will have MiNiFi installed on them. You then wish to have each of those MiNiFi instances transmit NiFI FlowFiles to a single NiFi instance/cluster. Correct? If so, this is very doable using NiFI S2S. The dataflow you build for use on your MiNiFi instances would need an "Remote Process Group" (RPG) configured to send data to a remote input port located on your NiFi instance/cluster. So in your MiNiFi dataflow you will send your NiFi Flowfiles to a RPG as follows: On your NiFi instance/cluster, you will have a remote input port that will accept FlowFiles from these RPGs of all your MiNiFi instances as follows: Thank you, Matt
... View more
10-19-2017
01:32 PM
1 Kudo
@Alvin Jin NiFi Remote Input and Output Ports can only be added to the root canvas level. Based on the screen shot you provided above, you have created your "sftp" input port in sub process group. Input and output ports exist to move FlowFIles between a process group and ONE LEVEL UP from that process group only. Input ports will accept FlowFiles coming from one level up and output ports allow FlowFiles to be sent one level up. You can only move FlowFiles up or down one level at a time. At the top level of your canvas (root process group level) adding input or output ports provides the ability for that NiFi to receive (input port) FlowFiles from another NiFi instance or have another NiFi pull files from (output port) that NiFi. We refer to input and output ports added the top level as remote input or output ports. While the same input and output icon in the UI is used to add both remote and embedded input and output ports, you will notice that they are rendered differently when added to the canvas. A Remote Input port (added to root canvas level) will appear as follows: While a local input port (added within a process group) will appear as follows: Thank you, Matt
... View more
10-19-2017
12:53 PM
@dhieru singh Have you seen this article: https://community.hortonworks.com/articles/30424/optimizing-performance-of-apache-nifis-network-lis.html Thanks, Matt
... View more
10-19-2017
12:50 PM
1 Kudo
@Bilel Boubakri NiFi's Site-To-Site (S2S) Protocol is most typically used to send data between Nifi instances (This includes between NiFi and MiNiFi). The below links provide more detail on S2S and how to configure it within NiFi. https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Remote_Group_Transmission https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#site_to_site_properties If you found that this answer addressed your question, please take a moment to click "Accept" below. Thank you, Matt
... View more
10-19-2017
12:45 PM
@Ben Morris NiFi has not explicitly defined max for the number of nodes that can be added to a single NiFi cluster. Just keep in mind that the more nodes you add, the more request replication that must occur between nodes. For example, If a user is connected to node 1 of 100 nodes and makes a change, that change must be replicated to all 99 other nodes. NiFi is configured with a number of node protocol threads (default 10). So NIFi is only capable of replicating that change to 10 nodes at a time. This value should be increased to accommodate larger clusters. Failing to adjust this value my result in nodes disconnecting because they did not receive the change request fast enough. In addition, you may need to be more tolerant on your connection and heartbeat timeouts. As far as max data per second, that is a hard number to lay out. It is highly dependent on a number of factors. Mostly affected by your particular dataflow implementation. Since NiFi is just a blank canvas in which you build your dataflow, in the end your dataflow design defines your performance/throughput in most cases. This comes down to which processors you use and how they are configured. Assuming you have a well designed and optimized dataflow design, you can expect upwards of the following: *** These numbers will still be affected by use of some processors. CompressContent for example: this processor can be CPU intensive over longer periods of time when compressing large files, so I can become a bottleneck. If you found that this answer addressed you question, please take a moment to click "accept". Thank you, Matt
... View more