Member since
07-30-2019
3427
Posts
1632
Kudos Received
1011
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 84 | 01-27-2026 12:46 PM | |
| 486 | 01-13-2026 11:14 AM | |
| 1022 | 01-09-2026 06:58 AM | |
| 915 | 12-17-2025 05:55 AM | |
| 976 | 12-15-2025 01:29 PM |
07-07-2017
01:16 PM
@Mark Heydenrych You may be able to use the ReplaceText processor to remove those blank lines from your input FlowFile's content before the SplitText processor. I did a little test that worked for me using the following configuration: This evaluates your FlowFile line by line and replace the line return (\n) on any line where the line starts with a line return with nothing. The effectively removes that blank line. After that my splitText reported teh correct fragment.count when I split the file. Thanks, Matt
... View more
07-07-2017
12:41 PM
@Bertrand Goubot MiNiFi will work the same way with regards to accessing locally mounted disks. NiFi and MiNiFi have no issues working with large Files as long as there is sufficient space in the content repository to store and do any processing needed on those large files. Thanks, Matt
TIP: We try to keep the discussion going under one answer rather then creating a new answer every time we respond back and forth in this forum. Unless someone if offering up a new solution/answer to the question.
... View more
07-06-2017
05:11 PM
@Bertrand Goubot A mounted file systems is treated like any other local system directory NiFi interacts with. NFS mounts are not going to be as performant as local disks of course. When NiFi ingest Files form your NFS mount, the content of those files is going to be placed in NiFi's content repository. Any processors that then work on that ingested content work use what is in teh content repository. Thanks, Matt
... View more
07-06-2017
01:28 PM
@Mark Heydenrych I generated an Apache Jira requesting a change to this behavior: https://issues.apache.org/jira/browse/NIFI-4156 If you found this answer addressed your question, please mark answer as accepted. Thank you,
Matt
... View more
07-06-2017
12:37 PM
2 Kudos
@Mark Heydenrych The default configuration of the SplitText processor is to not emit FlowFiles where the content is just a blank line. This behavior is controlled by the "Remove trailing Newlines" property. The fragment.count attributes is set based on the total number of fragments in the original FlowFile's content. The Fragment.index is is a one up number assigned to each FlowFile emitted. So in your case, i suspect that your original FlowFile's content contained 66,443 lines with 13 of those lines as just blank lines that were not emitted. If you change "Remove trailing Newlines" to "false", your emitted count will match your Fragment.count. Thanks, Matt
... View more
07-05-2017
10:31 PM
1 Kudo
@Adda Fuentes Try adjusting your connection timeout settings in your nifi.properties file.... nifi.cluster.node.connection.timeout = 30 sec nifi.cluster.node.read.timeout = 30 sec This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator. Thanks, Matt
... View more
07-05-2017
06:48 PM
@Bharadwaj Bhimavarapu General guidance here is these values should be set to 2 times the number of available cores and no more then 4 times the number of available cores on a single instance of NiFi. If you are running a NiFi cluster, these values are enforced per node. So a setting of 16 in a 4 node cluster equates to a total of 64 threads cross the cluster. Setting values to high just results in many more threads in cpu wait and will not help performance at all. Beyond increasing these value you need to be mindful of how many concurrent task you assign each of your processors. Some processor are more cpu intensive then others (meaning they take longer to complete a job holding the thread much longer). You can look at the "tasks/time =: stats on a processor to see if it thread are long or short running. For processors that have long running threads you want to be extra careful on how many concurrent tasks you assign them. Thanks, Matt
... View more
07-05-2017
06:33 PM
1 Kudo
@M R Using a single partition in Kafka is similar to using a single Node in NiFi to maintain order. The enforceOrder processor is a great new addition for enforcing order of FlowFiles, but it will only enforce orders on those FlowFiles that reside on the same node in a NiFi cluster. So if you are trying to enforce processing order of FlowFiles across numerous nodes, this processor will not do that. You would need to get all FlowFiles for which you want to enforce order on to the same node before using this processor. I don't fully understand you entire use case, but a couple other processor you may want to look at include wait ad notify processors. These are also new in the latest HDF 3.0 and NiFi 1.2.0 releases. Thanks, Matt
... View more
07-05-2017
05:14 PM
1 Kudo
@Greg Keys The only destination processor component that would affect the emptying of a queue would be the processor that connection is attached to. Which processor type is this connection you are trying to empty attached to? Does this processor show any active threads in the upper right corner? It may take some thread dump analysis to determine why the particular processor is not releasing its threads, if some configuration issue is not obvious. If you restart NiFi, you are likely to get stuck in the same state again because this downstream processor is likely to run before you get access to the NiFi UI to stop it before it runs. There are a couple things you do to get around this: 1. Try setting FlowFile expiration on the connection to "1 sec" this allows the controller to handle the deletion of FlowFiles from the queue for you. This is of course if NiFi will allow you to edit the connection while the downstream component is till running. 2. The more likely successful operation is to shutdown NiFi and change the "autoresume state" NiFi configuration in the nifi.properties file from true to false. On restart all components will come up stopped. This will allow you to right click on the connection in question and empty it. This will also make sure the downstream processor is in a completely stopped state so its configuration can be changed. Don't forget to change autoresume state back to true after making you changes or every time NiFi is restarted everything will come up stopped. Thanks, Matt
... View more
07-05-2017
03:30 PM
1 Kudo
@J. D. Bacolod I like the idea of creating an attribute of FlowFiles that are routed to a failure relationship that will identify which component routed that FlowFile. I suggest opening an Apache Jira for this enhancement.
For now you can use NiFi's data provenance capability to get the lineage of FlowFile that was processed by your dataflows. Lineage can be used to show all routing and processing done for a given FlowFile. It will not provide details on the reason the FlowFile was routed to failure. Once you have the timestamp of the failure event, you can look up the details in your nifi-app.logs. Thanks, Matt
... View more