About MattWho

MattWho · ‎07-07-2017

@Mark Heydenrych You may be able to use the ReplaceText processor to remove those blank lines from your input FlowFile's content before the SplitText processor. I did a little test that worked for me using the following configuration: This evaluates your FlowFile line by line and replace the line return (\n) on any line where the line starts with a line return with nothing. The effectively removes that blank line. After that my splitText reported teh correct fragment.count when I split the file. Thanks, Matt

MattWho · ‎07-07-2017

@Bertrand Goubot MiNiFi will work the same way with regards to accessing locally mounted disks. NiFi and MiNiFi have no issues working with large Files as long as there is sufficient space in the content repository to store and do any processing needed on those large files. Thanks, Matt TIP: We try to keep the discussion going under one answer rather then creating a new answer every time we respond back and forth in this forum. Unless someone if offering up a new solution/answer to the question.

MattWho · ‎07-06-2017

@Bertrand Goubot A mounted file systems is treated like any other local system directory NiFi interacts with. NFS mounts are not going to be as performant as local disks of course. When NiFi ingest Files form your NFS mount, the content of those files is going to be placed in NiFi's content repository. Any processors that then work on that ingested content work use what is in teh content repository. Thanks, Matt

MattWho · ‎07-06-2017

@Mark Heydenrych I generated an Apache Jira requesting a change to this behavior: https://issues.apache.org/jira/browse/NIFI-4156 If you found this answer addressed your question, please mark answer as accepted. Thank you, Matt

MattWho · ‎07-06-2017

@Mark Heydenrych The default configuration of the SplitText processor is to not emit FlowFiles where the content is just a blank line. This behavior is controlled by the "Remove trailing Newlines" property. The fragment.count attributes is set based on the total number of fragments in the original FlowFile's content. The Fragment.index is is a one up number assigned to each FlowFile emitted. So in your case, i suspect that your original FlowFile's content contained 66,443 lines with 13 of those lines as just blank lines that were not emitted. If you change "Remove trailing Newlines" to "false", your emitted count will match your Fragment.count. Thanks, Matt

MattWho · ‎07-05-2017

@Adda Fuentes Try adjusting your connection timeout settings in your nifi.properties file.... nifi.cluster.node.connection.timeout = 30 sec nifi.cluster.node.read.timeout = 30 sec This will give nodes a little longer to respond to requests before being disconnected by the cluster coordinator. Thanks, Matt

MattWho · ‎07-05-2017

@Bharadwaj Bhimavarapu General guidance here is these values should be set to 2 times the number of available cores and no more then 4 times the number of available cores on a single instance of NiFi. If you are running a NiFi cluster, these values are enforced per node. So a setting of 16 in a 4 node cluster equates to a total of 64 threads cross the cluster. Setting values to high just results in many more threads in cpu wait and will not help performance at all. Beyond increasing these value you need to be mindful of how many concurrent task you assign each of your processors. Some processor are more cpu intensive then others (meaning they take longer to complete a job holding the thread much longer). You can look at the "tasks/time =: stats on a processor to see if it thread are long or short running. For processors that have long running threads you want to be extra careful on how many concurrent tasks you assign them. Thanks, Matt

MattWho · ‎07-05-2017

@M R Using a single partition in Kafka is similar to using a single Node in NiFi to maintain order. The enforceOrder processor is a great new addition for enforcing order of FlowFiles, but it will only enforce orders on those FlowFiles that reside on the same node in a NiFi cluster. So if you are trying to enforce processing order of FlowFiles across numerous nodes, this processor will not do that. You would need to get all FlowFiles for which you want to enforce order on to the same node before using this processor. I don't fully understand you entire use case, but a couple other processor you may want to look at include wait ad notify processors. These are also new in the latest HDF 3.0 and NiFi 1.2.0 releases. Thanks, Matt

MattWho · ‎07-05-2017

@Greg Keys The only destination processor component that would affect the emptying of a queue would be the processor that connection is attached to. Which processor type is this connection you are trying to empty attached to? Does this processor show any active threads in the upper right corner? It may take some thread dump analysis to determine why the particular processor is not releasing its threads, if some configuration issue is not obvious. If you restart NiFi, you are likely to get stuck in the same state again because this downstream processor is likely to run before you get access to the NiFi UI to stop it before it runs. There are a couple things you do to get around this: 1. Try setting FlowFile expiration on the connection to "1 sec" this allows the controller to handle the deletion of FlowFiles from the queue for you. This is of course if NiFi will allow you to edit the connection while the downstream component is till running. 2. The more likely successful operation is to shutdown NiFi and change the "autoresume state" NiFi configuration in the nifi.properties file from true to false. On restart all components will come up stopped. This will allow you to right click on the connection in question and empty it. This will also make sure the downstream processor is in a completely stopped state so its configuration can be changed. Don't forget to change autoresume state back to true after making you changes or every time NiFi is restarted everything will come up stopped. Thanks, Matt

MattWho · ‎07-05-2017

@J. D. Bacolod I like the idea of creating an attribute of FlowFiles that are routed to a failure relationship that will identify which component routed that FlowFile. I suggest opening an Apache Jira for this enhancement. For now you can use NiFi's data provenance capability to get the lineage of FlowFile that was processed by your dataflows. Lineage can be used to show all routing and processing done for a given FlowFile. It will not provide details on the reason the FlowFile was routed to failure. Once you have the timestamp of the failure event, you can look up the details in your nifi-app.logs. Thanks, Matt

Online	Online
Last Visited	‎01-31-2026 12:42 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎01-31-2026 12:42 AM
Posts	3,427
Kudos received	1628

Cloudera Community

Re: Setting TTL per key when writing to redis

Re: Best Practice for configuring registry flows

Re: Nifi 2.7.2 Start Problem

Re: Error importing NiFi workflow template from ve...

Re: Error importing NiFi workflow template from ve...

Re: Incorrect fragment.count in nifi

Re: use NiFi processor to parse files mounted on f...

Re: use NiFi processor to parse files mounted on f...

Re: Incorrect fragment.count in nifi

Re: Incorrect fragment.count in nifi

Re: Read Time out issue with NiFi cluster NiFi ver...

Re: Generateflow file not genarating flow files

Re: Does NiFi preserves order of transactions in a...

Re: NiFi message when emptying queues: "Waiting fo...

Re: NiFi origin processor name or id as attribute