Member since
07-30-2019
3472
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 237 | 06-03-2026 06:06 PM | |
| 518 | 05-06-2026 09:16 AM | |
| 986 | 05-04-2026 05:20 AM | |
| 573 | 05-01-2026 10:15 AM | |
| 683 | 03-23-2026 05:44 AM |
03-04-2020
01:05 PM
@varun_rathinam Can you please elaborate on "processor drop the file and join with new files"? And also "content_repository backup i limit"? <-- Are you referring to the "nifi.content.repository.archive.max.retention.period" and "nifi.content.repository.archive.max.usage.percentage" configuration settings in the nifi.properties file? Also sharing a screenshot of your current MergeContent processor's configuration along with more details around your use case. What result are you seeing now and what is the desired result? The MergeContent processor takes multiple FlowFiles from same NiFi node and merges the content of those FlowFiles based on the processor's configuration in to one or more new FlowFile's per node. The processor cannot merge FlowFiles residing on different NiFi nodes in an NiFi cluster into one FlowFile. FlowFiles from the inbound connection queue are allocated to bins based on the following configuration properties: Correlation Attribute Name <-- (Optional) when used, only FlowFile with same value in the configured FlowFile attribute will be placed in same bin. Maximum Number of Entries <-- Maximum number of FlowFiles that can be allocated to a single bin before a new bin is used. Maximum Group Size <-- (Optional) Maximum cumulative size of the content that can be allocated to a bin When a "bin" is eligible to be merged is controlled by these configuration properties: Minimum Number of Entries <-- If at end of thread execution (after all FlowFiles from inbound connection have been allocated to one or more bins) the number of FlowFiles allocated to a bin meets this min and meets configured min group size, the FlowFiles in that bin will be merged. Minimum Group Size <-- Same as above Max Bin Age <-- A bin that has not reached or exceeded both above min values will merge once the bin has had FlowFiles in it for this amount of time Maximum number of Bins <-- If FlowFile have been allocated to every bin and another bin is needed, the oldest bin will be forced to merge to free a bin. It is possible that one or both min values are never reached if a Max bin setting is reached first. This means that because of max additional FlowFiles can not be allocated to that bin and the only setting that will force that bin to merge is "Max Bin Age" or you run out of free bins. As far as bin Max values, NiFi really does not care about content size as it streams the merged FlowFiles content in to a new FlowFile and does not hold that content in memory. NiFi can experience Out OF Memory (OOM) conditions if the number of FlowFiles Max is set too high since all the attributes for every FlowFile currently allocated to bin(s) is held in heap memory. NiFi's allocated heap memory is set in the nifi.properties configuration file. So a Max number of entries should be limited to 10000 (but this varies based on memory availability and number and size of attributes on your FlowFiles. You can use multiple MergeContent processors in series (one after another) to merge multiple merged FlowFiles in to even larger merged FlowFiles if desired. Hope this helps with understanding the MergeContent processor, Matt
... View more
03-04-2020
10:55 AM
@Umakanth The API is exposed out of the box, it is not something you need to enable. Every action you take while performing actions within the UI makes a call to the NiFi rest-api. When learning how to use the rest-api calls, you may find using the developer tools in your browser helpful. Open the developer tools while you are accessing your NiFi UI. Then perform some action and you will see those requests being made by your browser to NiFi. In the below example (Using Chrome browser developer tools), I opened NiFi's summary UI from the global menu in the upper right corner of the UI: You'll notice that several requests were made. I can write click on any one of those request and select "Copy as cURL" to copy the full request to the system clipboard. I can then paste the request in a terminal window and see what the rest-api call returns. You will notice that the curl command that is copied will have numerous additional headers (-H) that are not always necessary depending on the rest-api endpoint being used. Example: curl 'http://<nifi-hostname>:<nifi-port>/nifi-api/flow/process-groups/root/status?recursive=true' Of course you will need to parse the rest-api returns yourself to extract in many cases the specific details/stats you want to monitor. Hope this helps, Matt
... View more
03-03-2020
02:19 PM
@Gubbi Bottom line is that NiFi processor in a dataflow do not execute sequentially. They each execute based on their configured run schedule. Each processor that is given a thread to execute can potentially utilize a cpu until that thread completes. Generally speaking most threads are very short lived resulting in ver minimal impact on your systems CPU. In your dataflow, I would expect that the FetchFile (actually retrieving the content of your 3 FlowFiles) and the putS3 (reading and sending content of your 3 FlowFiles) would hold threads the longest. While both were executing at the same time, it could be using 200% (2 cpus). Also keep in mind that NiFi core is using threads as well. so seeing NiFI use over 100% is pretty much what I would expect anytime it is not sitting idle. Hope the information provided helps you, Matt
... View more
03-03-2020
02:02 PM
@Alexandros Securing NiFi and NiFi-registry will always require TLS certificates. There are then numerous options for authentication in to those secured sevices. Both NiFi and NiFi Registry both offer: 1. User based certificate authentication. You would need to create a user certificate for each user who will access NiFi or NiFi-registry 2. Spnego - This requires that you have a KDC and your users have Spnego enabled in their browser 3. LDAP/AD user authentication. You would need to have your own LDAP/AD setup which you can use to authenticate your users. 4. kerberos login provider. This would require you to setup your own KDC as well. NiFi also supports OpenID connect compatible service based authentication; however, the same is not offered in NiFi-Registry. The jira for adding OpenID connect capability to NiFi-Registry is still open here: https://issues.apache.org/jira/browse/NIFIREG-313 So based on options above and depending on the number of users you want to give access to, your best options are either by issuing each of your users a user/client certificate or setting up a simple LDAP server or KDC server. Hope this helps, Matt
... View more
03-03-2020
01:48 PM
@domR Option 2: The good news is that as of Apache NiFi 1.10 you can create remote input and output ports at any process group level (https://issues.apache.org/jira/browse/NIFI-2933). This is probably your best option as it handles distribution across all available nodes in your cluster. IF a node goes down the Remote Process Group (RPG) will not only distribute FlowFiles to remaining nodes that are available. Option 3: Here you would use a PostHTTP processor and ListenHTTP processor. The downside to this option over option 2 is that the PostHTTP processor can only be configured to send to one URL endpoint. So if target NiFi node is down, it will fail to send. If course you could connect failure from one PostHTTP to another and so on adding as many postHTTP processors as you have nodes, but this does not scale well. Hope this helps, Matt
... View more
03-03-2020
01:14 PM
@TVGanesh The following statement is not accurate: "I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores" The general recommendation is that the "Max Timer Driven Thread Count" is set to 2 to 4 times the number of cores. This setting is all relative to the other process running on your server (or you mac in this case). The "Max Timer Driven Thread Count" setting establishes the max number of threads that can be handed out to requesting components that want to execute. (This is a soft limit, there are some scenarios where a thread can be obtained even when active threads executing has reached this configured max count). The "Max Timer Driven Thread Count" is configured under the NiFi Global Menu --> Controller Settings --> General (tab). When you adjust this value, monitor your cpu usage and adjust accordingly. Keep in mind that adding additional concurrent tasks to your processor will not improve the processing of a single FlowFile. The concurrency allows the processor to work on different FlowFiles pulled from the inbound connection queue concurrently. In the case of the ExecuteStreamCommand processor, the ability to execute the same command concurrently also is dependent of the command you are executing. A small number will be displayed in the upper right corner of the processor illustrating the number of currently active threads in use by that processor at time of last browser refresh (NiFi browser auto refresh default is every 30 seconds). Hope this helps, Matt
... View more
03-02-2020
06:53 AM
@Gubbi What that stat is telling you is that the processor executed a total of 297 threads in the past 5 minutes. You can see that the cumulative time for all 297 threads that executed was only 0.044 seconds. So we can see that these 297 threads consumed in total very little time on the CPU. What this also tells us is that the processor executed about every 1 sec. This is either because your configured run schedule is every 1 sec (0 sec is default) or there where no new files found when it did each execution. To prevent a processor from consuming excessive CPU when the run schedule is set to 0 sec, NiFi will yield a processor after a thread runs which produces no results. What you have shown looks to be as expected behavior. Thanks, Matt
... View more
02-28-2020
12:45 PM
@Gubbi Not sure what you mean by spikes up to 300. A screenshot(s) maybe helpful to understand what you are observing.
... View more
02-28-2020
12:44 PM
1 Kudo
@maryem Any action you can do through the NiFi UI, you can also do via interacting directly with the NiFi rest-api. This will not animate the action of actually dragging and dropping a processor on the canvas, but you can make a rest-api call that would add new processor of type ABC at coordinates x,y on the canvas. NiFi's rest-api documentation can be found here: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html Some users find it easier to learn the rest-api call through examples. If you open the developer tools in your browser, you can perform the action via the UI and see the rest-api call that was made. Most browser developer tools even let you save the rest-api call as a curl command that you could then execute yourself via command line. Matt
... View more
02-26-2020
12:41 PM
@Gubbi It is impossible to say what is going on here without more detail. From where are you determining NiFi is using greater than 100% CPU? If you are you are looking at top, with 8 cpu you would have 800% cpu available (100% for each of the 8 cpu). So 100%+ my be normal expected for the NiFi Java process. You have the NiFi core process running, plus each processor can execute concurrently. For example, FetchFile may be fetching the content of FLowFile 2 while PutS3 is putting FlowFile 1 to S3 at the same time. Since NiFi is designed to multi threaded, it is possible that multiple threads being executed concurrently, with each of those threads being handled by a different CPU. By default, the configured "Max Timer Driven Thread Count" in NiFi is set to 10. This means that across all processors 10 threads can be requested concurrently. This is a soft limit, so there are scenarios where the number of active threads can extend beyond the max configured thread count. Plus core level thread are used which do not come from this thread pool which is used only by NiFi components added to the canvas. What other processes are consuming CPU on your system? How is your NiFi configured, what are all the processors on your canvas, how are they configured, how large are the 3 files you are processing, what do the tasks/time stats on your processors show when this dataflow is executing against your 3 FlowFiles, etc? Hope this helps give you some direction to investigate. Matt
... View more