Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 150 | 06-03-2026 06:06 PM | |
| 460 | 05-06-2026 09:16 AM | |
| 827 | 05-04-2026 05:20 AM | |
| 496 | 05-01-2026 10:15 AM | |
| 622 | 03-23-2026 05:44 AM |
05-22-2018
04:02 PM
1 Kudo
@Dilip Namdev
The fact that every "archive" sub-directory is empty leads me to believe that archive is in fact working correctly. NiFi stores FlowFile content in claims within the content repository. One claim may contain 1 to many Flowfiles. All it takes is one FlowFile to still be active in one of your dataflows (queued in some NiFi connection) to hold up an entire content claim. A content claim cannot be moved to archive unless all active flowfiles referencing that claim are complete (meaning reached a point of termination in your dataflow). - The following article explains this in more detail: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html - Aside from the above, NiFi opens a lot of file handles. Having insufficient file handles can cause issues with creation of new files. This may affect proper cleanup of both the flowfile and content repositories. I suggest making sure the user that owns your NiFi process has a high number of open file handles available to it. - Thanks, Matt - If you found this answer addressed your question, please take moment to login and click "accept" below the answer
... View more
05-18-2018
12:04 PM
@Sharoon Babu NiFi processors like these execute against FlowFiles on inbound connections to the processor. The FlowFile is only removed from the inbound connection when that code execution results in that FlowFile being transitioned to an outbound connection. - There are two types of scenarios here: 1. NiFi is shutdown or dies in the middle of a processors execution. This means the FlowFile was never transferred to an outbound connection. When NiFi is restarted, NiFi will reload FlowFiles in to the last connection they were recorded as belong to. In this case that would be an inbound connection. The consuming processor of that connection will then be scheduled to run/execute again. Processors do not record and intermediate phase fo processing and thus will begin executing against the entire FlowFile again. - 2. Some network failure results in execution not being able to complete. NiFi processors should acknowledge failures in such case which would result in the the FlowFile(s) being moved from the inbound connection to an outbound connection (like a "failure" relationship). It is the responsibility of the dataflow designer to account for such unexpected failures and route those outbound failure relationships accordingly. Often times failure type relationships may be just looped back on the same processor for retry. Wherever this FlowFile is routed (even if in a loop), Execution will again be against the entire Flowfiles content again. - The target systems should handle such scenarios and not except unconfirmed file transfers. - For example: PutFile will write the file using a "dot" rename strategy. The FlowFiles content is originally written as a ".<filename>" and then upon successful completion of writing the data, the filename is renamed from ".<filename>" to just "<filename>". Since dot files are in most cases considered hidden files and ignored by source systems that incomplete transfer would be ignored by destination system. Upon recover and re-attempt (depending on processor configuration) NiFi will repeat this process. - There are some unavoidable scenarios that at times can lead to some data duplication. Considering NiFi's design architecture, NiFi has always favored data duplication over data loss in such rare scenarios. - Thank you, Matt - If you found this answer has addressed your question, please take a moment to log in and click the "accept" link on the answer.
... View more
05-07-2018
06:14 PM
@John T I have recently built out an HDF environment for a Fortune 1 retail company to handle 1-2k connections per node and move an average of 1-1.5TB a day. We utilized the HandleHTTP processors as MiNiFi was not an option at project conception. If you are using the HandleHTTPRequest/Response processors, note that there is a bug which causes objects to not be released correctly causing heap utilization to climb in a linear fashion. Our workaround was to utilize the API to stop/start the HandleHTTPRequest processor when the heap reached 70%. This bug was corrected in the 1.6 release of NiFi but has not been rolled up into an HDF release since I last checked. So, handling that kind of volume will cause the same scenario in your situation. If you can use ListenHTTP (or MiNiFi as Matt suggested), you should be fine. We were utilizing external load balancers as we were running three clusters in separate data centers. The plan in the next phase is to start utilizing MiNiFi in the edge environments and point the different systems feeding data into HDF at those MiNiFi HTTP listeners. If you are running a single cluster, as Matt mentioned, that would load balance for you.
... View more
02-05-2019
06:20 PM
@mattclarke
I'm having the same problem using "ScrollElasticSearchHttp" processor. Processor state shows one or more nodes of cluster, depending on situation, even when I've configured to "Primary Node only". Flowfiles have been duplicated on each added node. How can I solve the problem?
... View more
04-27-2018
06:11 AM
@Matt Clarke Thanks Matt for the information..and helping out...it worked
... View more
04-17-2018
01:34 PM
@Matt Clarke, seems quite refined approach. happy to see your response.
... View more
11-21-2018
03:59 PM
Thanks for your answer, I wanted to have only "one" queue were all flowfiles would be waiting.I know now that it was i bad idea => I reduced the size of the queue and now use backpresure. It corrected the priority problem. Thanks again !
... View more
04-11-2018
05:46 AM
Thanks for the solution, but since i am not familiar with rest api, solution by Matt looks easy to me. Will surely try yours one too.
... View more
03-14-2018
05:14 PM
ok changed to 2 secs. Will keep an eye on it to monitor for future data duplication. Going to accept this as answer for now. Thank you.
... View more