Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 141 | 06-03-2026 06:06 PM | |
| 458 | 05-06-2026 09:16 AM | |
| 821 | 05-04-2026 05:20 AM | |
| 493 | 05-01-2026 10:15 AM | |
| 619 | 03-23-2026 05:44 AM |
05-11-2017
03:15 PM
@Gaurav Jain When you find an answer in Hortonworks Community Connections (HCC) that addresses your question, please accept that answer so that other HCC users know what worked for you. Thank you kindly, Matt
... View more
05-11-2017
02:16 PM
@Gaurav Jain Here is an article i wrote awhile ago that explains the differences between using GetSFTP processor or List and Fetch SFTP processors:
https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html Thanks, Matt
... View more
05-11-2017
11:49 AM
1 Kudo
@Gaurav Jain This is the exact use case for why GetSFTP was deprecated in favor of listSFTP and FetchSFTP processors. The ListSFTP processor would run on the primary node only. It produces one 0 byte FlowFile for every file in the listing. All these 0 byte FlowFiles are then sent to a RPG for distribution across cluster. The distributed files are then fed to a FetchSFTP processor that will retrieve the content form the SFTP server and insert it in to the FlowFile at that time. This model eliminates the overhead on the Primary node since it does not need to write the content and it reduces network overhead between nodes since their is no content being send in FlowFiles via the RPG. The only issue you are going to run in to is: https://issues.apache.org/jira/browse/NIFI-1202 This issue is addressed in Apache NiFi 1.2.0 which was just released this week. It will also be addressed in HDF 3.0 which will be released soon. You can work around the issue in older versions by setting a small object backpressure threshold on the connection feeding your RPG. Since this backpressure is a soft limit, you need to put a processor between your listSFTP processor and the RPG that only processes FlowFiles one at a time. I recommend RouteOnAttribute (no configuration needed on processor, simply route the one existing "unmatched" relationship to the RPG and set back pressure on that connection). Thanks, Matt
... View more
05-09-2017
04:05 PM
I literally hit the "tab" key on my keyboard.
... View more
05-09-2017
03:53 PM
1 Kudo
@Prabir Guha You can use the replaceText processor to replace tabs with commas in a text/plain input file. lets assume my input file's content has the following value:
I could then configure my replaceText processor to do teh following: The Search Value is set to a tab. The Replacement Value is set to a comma. The resulting content is: Thanks, Matt
... View more
05-09-2017
12:45 PM
@Sertac Kaya Glad you were able to get the performance improvement you were looking for by allowing your NiFi instance access to additional system threads. If this answer helped you get to your solution, please mark it as accepted. Thank you, Matt
... View more
05-08-2017
03:08 PM
1 Kudo
@Gaurav Jain Each node in a cluster is responsible for working on its own FlowFiles. Each node is unaware of what FlowFiles other nodes are working on. If a NiFi processor component is working on a FlowFile at the time the Node goes down, the transformation work will start over once that the node is running again. A node disconnecting will not cause processing of FlowFiles to stop on the disconnected node. Processors that do transformation of FlowFile content will produce a new FlowFile once the transformation is complete. So if failure exists mid processing, the original remains on the incoming queue to the processor and the intermediate work is lost. This is how NiFi ensures no data loss occurs in unexpected failures. That being said Data plane High Availability (HA) is one of NiFi's roadmap items. Thanks, Matt
... View more
05-08-2017
12:22 PM
2 Kudos
@Gaurav Jain The URL provided when adding the Remote Process Group (RPG) to your canvas must be successful only when initially added. Once a successful connection is established the target instance will return a list of currently connected cluster nodes. The source instance with the RPG will record those hosts in peer files. From that point forward the RPG constantly updates the list of available nodes and will not only load-balance to those nodes but will also use anyone of them to get an updated status. Lets assume your source instance of NiFi has trouble getting a status update from any of the nodes, it will still attempt to load-balance with failover delivery of data to the last known set of nodes until communication is successful in getting an updated list. In addition, NiFi will also allow you to specify multiple URLs in the RPG when you create it. Simply provide a comma separated list of URLS for the nodes in the same target cluster. This does not change how the RPG works. It will still constantly retrieve a new listing of available nodes. This allows the target cluster to scale up or down without affecting your Site-To-Site (S2S) functionality. Thanks, Matt
... View more
05-04-2017
04:29 PM
2 Kudos
@Prabir Guha You would certainly use the UpdateAttribute processor to do this and a NiFi expression language statement as follows : ${filename:substringAfterLast('/')} Thanks, Matt
... View more
05-03-2017
12:44 PM
1 Kudo
@Sertac Kaya FlowFiles are transferred in a batches between process groups, but that transfer amounts to a updated FlowFile records. This transfer should take fractions of a ms to complete. So many threads should execute per second. So this raises the question of whether your flow is thread starved, concurrent tasks have been over allocated across your processors, your NiFi max timer driven thread count is to low, or your disk IO is very high. I would start by looking at your "Max Timer Driven Thread Count" settings. The default is only 10. By default every component you add to the NiFi canvas uses Timer driven threads. The above count restricts how many system thread can be allocated to components at any one time. I setup a simple 4 cpu vm running a default configuration. The number of FlowFiles passed through the connection between process group 1 and process group 2 ranged between 7084/second to 12,200/second. Thanks, Matt
... View more