About MattWho

MattWho · ‎05-11-2017

@Gaurav Jain NiFi does not redistribution of FlowFiles at this time between nodes behind the scenes. Any redistribution of FLowFiles between nodes in a cluster has to be done programmatically through your dataflow design via components (processors like postHTTP to ListenHTTP or RPG) to push FlowFiles to other nodes. Thanks, Matt

MattWho · ‎05-11-2017

@Gaurav Jain When you find an answer in Hortonworks Community Connections (HCC) that addresses your question, please accept that answer so that other HCC users know what worked for you. Thank you kindly, Matt

MattWho · ‎05-11-2017

@Gaurav Jain Here is an article i wrote awhile ago that explains the differences between using GetSFTP processor or List and Fetch SFTP processors: https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html Thanks, Matt

MattWho · ‎05-11-2017

@Gaurav Jain This is the exact use case for why GetSFTP was deprecated in favor of listSFTP and FetchSFTP processors. The ListSFTP processor would run on the primary node only. It produces one 0 byte FlowFile for every file in the listing. All these 0 byte FlowFiles are then sent to a RPG for distribution across cluster. The distributed files are then fed to a FetchSFTP processor that will retrieve the content form the SFTP server and insert it in to the FlowFile at that time. This model eliminates the overhead on the Primary node since it does not need to write the content and it reduces network overhead between nodes since their is no content being send in FlowFiles via the RPG. The only issue you are going to run in to is: https://issues.apache.org/jira/browse/NIFI-1202 This issue is addressed in Apache NiFi 1.2.0 which was just released this week. It will also be addressed in HDF 3.0 which will be released soon. You can work around the issue in older versions by setting a small object backpressure threshold on the connection feeding your RPG. Since this backpressure is a soft limit, you need to put a processor between your listSFTP processor and the RPG that only processes FlowFiles one at a time. I recommend RouteOnAttribute (no configuration needed on processor, simply route the one existing "unmatched" relationship to the RPG and set back pressure on that connection). Thanks, Matt

MattWho · ‎05-09-2017

I literally hit the "tab" key on my keyboard.

MattWho · ‎05-09-2017

@Prabir Guha You can use the replaceText processor to replace tabs with commas in a text/plain input file. lets assume my input file's content has the following value: I could then configure my replaceText processor to do teh following: The Search Value is set to a tab. The Replacement Value is set to a comma. The resulting content is: Thanks, Matt

MattWho · ‎05-09-2017

@Sertac Kaya Glad you were able to get the performance improvement you were looking for by allowing your NiFi instance access to additional system threads. If this answer helped you get to your solution, please mark it as accepted. Thank you, Matt

MattWho · ‎05-08-2017

@Gaurav Jain Each node in a cluster is responsible for working on its own FlowFiles. Each node is unaware of what FlowFiles other nodes are working on. If a NiFi processor component is working on a FlowFile at the time the Node goes down, the transformation work will start over once that the node is running again. A node disconnecting will not cause processing of FlowFiles to stop on the disconnected node. Processors that do transformation of FlowFile content will produce a new FlowFile once the transformation is complete. So if failure exists mid processing, the original remains on the incoming queue to the processor and the intermediate work is lost. This is how NiFi ensures no data loss occurs in unexpected failures. That being said Data plane High Availability (HA) is one of NiFi's roadmap items. Thanks, Matt

MattWho · ‎05-08-2017

@Gaurav Jain The URL provided when adding the Remote Process Group (RPG) to your canvas must be successful only when initially added. Once a successful connection is established the target instance will return a list of currently connected cluster nodes. The source instance with the RPG will record those hosts in peer files. From that point forward the RPG constantly updates the list of available nodes and will not only load-balance to those nodes but will also use anyone of them to get an updated status. Lets assume your source instance of NiFi has trouble getting a status update from any of the nodes, it will still attempt to load-balance with failover delivery of data to the last known set of nodes until communication is successful in getting an updated list. In addition, NiFi will also allow you to specify multiple URLs in the RPG when you create it. Simply provide a comma separated list of URLS for the nodes in the same target cluster. This does not change how the RPG works. It will still constantly retrieve a new listing of available nodes. This allows the target cluster to scale up or down without affecting your Site-To-Site (S2S) functionality. Thanks, Matt

MattWho · ‎05-04-2017

@Prabir Guha You would certainly use the UpdateAttribute processor to do this and a NiFi expression language statement as follows : ${filename:substringAfterLast('/')} Thanks, Matt

Online	Online
Last Visited	‎07-08-2026 05:13 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-08-2026 05:13 PM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: Nifi Cluster with Remote Process Group

Re: Nifi Cluster with Remote Process Group

Re: Nifi Cluster with Remote Process Group

Re: Nifi Cluster with Remote Process Group

Re: How best to replace all TABs (\t) by COMMAs ...

Re: How best to replace all TABs (\t) by COMMAs ...

Re: How to speed up Nifi FlowFile transfer from Pr...

Re: Nifi Cluster with Remote Process Group

Re: Nifi Cluster with Remote Process Group

Re: How to parse the NiFi filename attribute that ...