About MattWho

raja11 · ‎06-05-2018

Yes.. I have to select "OldestFlowFileFirstPrioritizer" to make it work. Thank you @Matt Clarke and @Shu

ganesh_sa20 · ‎06-08-2018

Thanks @Matt Burgess

MattWho · ‎05-22-2018

@Dilip Namdev The fact that every "archive" sub-directory is empty leads me to believe that archive is in fact working correctly. NiFi stores FlowFile content in claims within the content repository. One claim may contain 1 to many Flowfiles. All it takes is one FlowFile to still be active in one of your dataflows (queued in some NiFi connection) to hold up an entire content claim. A content claim cannot be moved to archive unless all active flowfiles referencing that claim are complete (meaning reached a point of termination in your dataflow). - The following article explains this in more detail: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html - Aside from the above, NiFi opens a lot of file handles. Having insufficient file handles can cause issues with creation of new files. This may affect proper cleanup of both the flowfile and content repositories. I suggest making sure the user that owns your NiFi process has a high number of open file handles available to it. - Thanks, Matt - If you found this answer addressed your question, please take moment to login and click "accept" below the answer

MattWho · ‎05-18-2018

@Sharoon Babu NiFi processors like these execute against FlowFiles on inbound connections to the processor. The FlowFile is only removed from the inbound connection when that code execution results in that FlowFile being transitioned to an outbound connection. - There are two types of scenarios here: 1. NiFi is shutdown or dies in the middle of a processors execution. This means the FlowFile was never transferred to an outbound connection. When NiFi is restarted, NiFi will reload FlowFiles in to the last connection they were recorded as belong to. In this case that would be an inbound connection. The consuming processor of that connection will then be scheduled to run/execute again. Processors do not record and intermediate phase fo processing and thus will begin executing against the entire FlowFile again. - 2. Some network failure results in execution not being able to complete. NiFi processors should acknowledge failures in such case which would result in the the FlowFile(s) being moved from the inbound connection to an outbound connection (like a "failure" relationship). It is the responsibility of the dataflow designer to account for such unexpected failures and route those outbound failure relationships accordingly. Often times failure type relationships may be just looped back on the same processor for retry. Wherever this FlowFile is routed (even if in a loop), Execution will again be against the entire Flowfiles content again. - The target systems should handle such scenarios and not except unconfirmed file transfers. - For example: PutFile will write the file using a "dot" rename strategy. The FlowFiles content is originally written as a ".<filename>" and then upon successful completion of writing the data, the filename is renamed from ".<filename>" to just "<filename>". Since dot files are in most cases considered hidden files and ignored by source systems that incomplete transfer would be ignored by destination system. Upon recover and re-attempt (depending on processor configuration) NiFi will repeat this process. - There are some unavoidable scenarios that at times can lead to some data duplication. Considering NiFi's design architecture, NiFi has always favored data duplication over data loss in such rare scenarios. - Thank you, Matt - If you found this answer has addressed your question, please take a moment to log in and click the "accept" link on the answer.

sthompson · ‎05-07-2018

@John T I have recently built out an HDF environment for a Fortune 1 retail company to handle 1-2k connections per node and move an average of 1-1.5TB a day. We utilized the HandleHTTP processors as MiNiFi was not an option at project conception. If you are using the HandleHTTPRequest/Response processors, note that there is a bug which causes objects to not be released correctly causing heap utilization to climb in a linear fashion. Our workaround was to utilize the API to stop/start the HandleHTTPRequest processor when the heap reached 70%. This bug was corrected in the 1.6 release of NiFi but has not been rolled up into an HDF release since I last checked. So, handling that kind of volume will cause the same scenario in your situation. If you can use ListenHTTP (or MiNiFi as Matt suggested), you should be fine. We were utilizing external load balancers as we were running three clusters in separate data centers. The plan in the next phase is to start utilizing MiNiFi in the edge environments and point the different systems feeding data into HDF at those MiNiFi HTTP listeners. If you are running a single cluster, as Matt mentioned, that would load balance for you.

guihahn · ‎02-05-2019

@mattclarke I'm having the same problem using "ScrollElasticSearchHttp" processor. Processor state shows one or more nodes of cluster, depending on situation, even when I've configured to "Primary Node only". Flowfiles have been duplicated on each added node. How can I solve the problem?

gillu_rock_in · ‎04-27-2018

@Matt Clarke Thanks Matt for the information..and helping out...it worked

rahulsmtauti · ‎04-17-2018

@Matt Clarke, seems quite refined approach. happy to see your response.

felicien_cathe1 · ‎11-21-2018

Thanks for your answer, I wanted to have only "one" queue were all flowfiles would be waiting.I know now that it was i bad idea => I reduced the size of the queue and now use backpresure. It corrected the priority problem. Thanks again !

srijitachaturve · ‎04-11-2018

Thanks for the solution, but since i am not familiar with rest api, solution by Matt looks easy to me. Will surely try yours one too.

Online	Offline
Last Visited	‎07-28-2026 01:05 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-28-2026 01:05 AM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: ListFile to list all the files sorted by date...

Re: How to process failed records in CDC?

Re: NiFI Content Repository archival is not workin...

Re: What if nifi fails to write data?

Re: 40 Gbps NiFi Cluster

Re: Duplicate of flowfile after NiFi primary node ...

Re: flowfile attributes in Nifi

Re: How to append HDFS file using putHDFS where Ni...

Re: Dissecting the NiFi "connection"... Heap usage...

Re: HDF/NiFi Improving the performance of your UI