Member since
07-30-2019
3436
Posts
1633
Kudos Received
1012
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 158 | 01-27-2026 12:46 PM | |
| 575 | 01-13-2026 11:14 AM | |
| 1273 | 01-09-2026 06:58 AM | |
| 1040 | 12-17-2025 05:55 AM | |
| 505 | 12-17-2025 05:34 AM |
03-05-2020
10:47 AM
@varun_rathinam Observations from your configuration: 1. You are using "Defragment" merge strategy which tells me that somewhere upstream in your dataflow you are splitting some FlowFile in to fragments and then you are using this processor to merge those fragments back in to the original FlowFile. Correct? When using Defragment you can not use multiple MergeContent processors in series as i mentioned earlier because the defragment strategy is expecting to find all fragments from the fragment count before merging them. 2. When using the defragment strategy it is the fragment.count attribute on the FlowFiles that dictates when the bin should be merged and not the min number of entries. 3. Each FlowFile that has a unique value in the fragment.identifier will be allocated to a different bin. Setting the number of bins to "1" will never work no matter which merge strategy you choose to use. When the MergeContent processor executes it first checks to see if a free bin is available (if not it merges oldest bin or routes oldest bins FlowFiles to failure in case of Defragment to free up a bin), then it looks at the current FlowFiles in the inbound connection at that exact moment in time and starts allocating them to existing bins or new bins. So at a minimum you should always have at least "2" bins. The default is "5" bins. Having multiple bins does not mean that all those available bins will be used. 4. I see you changed Maximum Number of Entries from default 1000 to 100000. Is this because you know each of the FlowFiles you split will produce up to 100,000 FlowFiles? As i mentioned the ALL FlowFiles allocated to bins have their attributes held in heap memory. Adding to that... If you have multiple bins being filled because you have unique fragment.identifiers being defragmented, you could have even more than 100,000 FlowFiles worth of attributes in heap memory. So your NiFi JVM heap memory being set at only 2GB may lead you to hitting Out Of Memory (OOM) conditions with such a dataflow design. Also want to add that where ever you are doing the original splitting of your FlowFile in your dataflow will also have an impact on heap memory because the FlowFile Attributes for every FlowFile being produced during the split process is held in heap memory until every new split FlowFile being produced is committed to a downstream connection. NiFi connections between processors have swapping enabled by default to help reduce heap usage when queues get large, but same does not apply within the internals of a processors execution. As i mentioned before, the MergeContent does not load FlowFile content in heap memory, so the size of your FlowFiles does not impact heap here. So you really want to step back and look at your use case again and ask yourself: "do I really need to split my source FlowFile and merge it back in to the original FlowFile to satisfy my use case?" NiFi has numerous record based processors for working with records avoiding the need to split them in many use cases. Hope this helps, Matt
... View more
03-05-2020
10:21 AM
@domR I see no issues with using a publish and consume design for triggering follow on flows. That provide a slightly more robust setup than using the postHTTP to ListenHTTP example I provided since any NiFi node can consume the trigger file. When using Kafka you will want to make sure you have the same number of partitions as you have consumer in the same consumer group. When you add a consumeKafka processor to the NiFi canvas it is really being added to every node in your cluster and is configured the same on every node. Let's assume you have a 3 node NiFi cluster, that means you have at a minimum 3 consumers in that consumer group so your Kafka topic should have 3 partitions; otherwise, you may see a lot of rebalancing happen. To improve performance even more you can increase the concurrent task on your consumeKafka processor. With a 3 node NiFi cluster and a consumeKafka configured with 2 concurrent tasks, you now have 6 total consumers in the consumer group; therefore, your Kafka topic should have 6 partitions. If a NiFi node goes down Kafka will assign multiple partitions to the same consumer, so no need to worry about messages not being consumed. Hope this information was useful in finding a solution, Matt
... View more
03-04-2020
01:39 PM
@NickH The FetchSFTP processor has multiple different relationships. For your use case of the file really not being there when the FetchSFTP tries to fetch the content, the expected outcome would be that the FlowFile is routed to the "not.found" relationship which you should auto-terminate, If you encountered some sort of communications failure (network issue during Fetch), the FlowFile should have been routed to the "comms.failure" relationship which should be looped back on processor to try again. The FetchSFTP also has a "permission.denied" relationship which you can perhaps handle via dataflow design as well. Perhaps sending an email alert? Hope this helps, Matt
... View more
03-04-2020
10:55 AM
@Umakanth The API is exposed out of the box, it is not something you need to enable. Every action you take while performing actions within the UI makes a call to the NiFi rest-api. When learning how to use the rest-api calls, you may find using the developer tools in your browser helpful. Open the developer tools while you are accessing your NiFi UI. Then perform some action and you will see those requests being made by your browser to NiFi. In the below example (Using Chrome browser developer tools), I opened NiFi's summary UI from the global menu in the upper right corner of the UI: You'll notice that several requests were made. I can write click on any one of those request and select "Copy as cURL" to copy the full request to the system clipboard. I can then paste the request in a terminal window and see what the rest-api call returns. You will notice that the curl command that is copied will have numerous additional headers (-H) that are not always necessary depending on the rest-api endpoint being used. Example: curl 'http://<nifi-hostname>:<nifi-port>/nifi-api/flow/process-groups/root/status?recursive=true' Of course you will need to parse the rest-api returns yourself to extract in many cases the specific details/stats you want to monitor. Hope this helps, Matt
... View more
03-04-2020
09:57 AM
@MahipalRathore The bulletin is not going to include anything more than what would be found in the nifi-app.log. The Bulletin if produced as a result of some failure while processing a FlowFile, will have details about the FlowFiles assigned UUID and filename as well as it size relative to content claim in which the FlowFiles content can be found. You could the use NiFi data provenance to obtain details on the FlowFile including all its FlowFile attributes. Some bulletins are for exceptions that occur unrelated to a FlowFile directly and thus will contain no FlowFile info. But if a FlowFile is routed to failure as a result of for example an exception thrown by the putHDFS processor, details on that FlowFile record should be included in the bulletin. Note: If the bulletin is being produced by a processor that creates a FlowFile, no FlowFile will have been created, so there is no FlowFile created from which to get FlowFile details. Hope this helps, Matt
... View more
03-04-2020
06:11 AM
Mattwho, Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as one of the input and processes using multiple threads. This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins. So many thanks for your comments, I now understand how to use concurrent tasks.
... View more
03-03-2020
01:25 PM
1 Kudo
@asfou NiFi does not contain any processors that support Hive version 2.x. The latest versions of Apache NiFi offer Hive 1.x and Hive 3.x client based processor components. To support Hive 2.x version, you may need to build your own custom processors built using the Hive 2.x client. Matt
... View more
02-28-2020
12:44 PM
1 Kudo
@maryem Any action you can do through the NiFi UI, you can also do via interacting directly with the NiFi rest-api. This will not animate the action of actually dragging and dropping a processor on the canvas, but you can make a rest-api call that would add new processor of type ABC at coordinates x,y on the canvas. NiFi's rest-api documentation can be found here: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html Some users find it easier to learn the rest-api call through examples. If you open the developer tools in your browser, you can perform the action via the UI and see the rest-api call that was made. Most browser developer tools even let you save the rest-api call as a curl command that you could then execute yourself via command line. Matt
... View more
02-25-2020
11:20 AM
@nishank_paras You can use the invokeHTTP processor to fetch your file. Here is an example: The above example fetches the Apache nifi-toolkit-1.11.3-bin.tar.gz file. You can then construct a dataflow using other processors to manipulate as you want or simply just connect to another invokeHTTP processor that instead of "GET" uses "PUT" to put your file "nn.csv" at the new http endpoint. Hope this helps you, Matt
... View more
02-25-2020
11:05 AM
@saivenkatg55 You need to literally use ./keystore.p12 in your command instead of just keystore.p12 curl --cert-type P12 --cert ./keystore.p12:password --cacert nifi-cert.pem -v https://w0lxqhdp04:9091/nifi-api/flow/search-results?q= Hope this helps, Matt
... View more