About MattWho

MattWho · ‎09-18-2023

@manishg The ListFile does not pickup any files. It simply generates a zero content NiFI FlowFile for each file found in the target directory. That FlowFile only has metadata about the target content. The FetchFile processor utilizes that metadata to fetch that actual content and add it to the FlowFile. The value added here happens when you have a lot target files to ingest. To avoid having all the disk I/o related to that content on one node, you can redistribute the zero byte FlowFiles across all nodes so that each node now in a distributed way fetches the content (This works assuming that same target directory is mounted on all NiFi cluster nodes). As @SAMSAL shared you could use Process Group (PG) FlowFile concurrency to accomplish the processing of one FlowFile at a time. The ListFile will still continue to list all FlowFiles in target directory (writes state and continues to list new files as they get added to input directory). You can then feed the outbound connection of your ListFile to a PG configured with "Single FlowFile Per Node" FlowFile concurrency. This will prevent any other FlowFile queued between ListFile and the PG to enter the PG until the first FlowFile has processed through that PG. So your first processor inside the PG would be your FetchFile processor. Now if you were to configure Load Balanced Connection on that connection between ListFile and the PG, You would end up with each node in your NiFi cluster processing a single File at a time. This gives you some concurrency if you want it. However, if you have a strict one file at a time, you would not configure load balanced connection. Hope this helps, Matt

davehkd · ‎09-18-2023

Thanks Matt!

MTK · ‎09-17-2023

> ListFile will lose tracking. This is beyond my knowledge. According to the introduction of ListFile, the tracking entity will be stored in redis. Why is it lost when I only upgrade nifi?

MattWho · ‎09-15-2023

@manishg The Record Reader and record Writer controller services are not responsible for tracking counts. This is handled within your custom processors code. I am not sure what your custom processor does and whether it makes sense to track "record processed" or some other record based stat, but you can look at the github code for other processors like PartitionRecord to see how RecordCount is handled. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

manishg · ‎09-14-2023

Its ListFile.

MattWho · ‎09-14-2023

@RRG I have never used bitbucket before. Version control of Process Groups in NiFi is handled by the NiFi-Registry service that integrates with NiFi. The NiFi-Registry service offers a GitHub integration option https://nifi.apache.org/docs/nifi-registry-docs/html/administration-guide.html#flow-persistence-providers I am not sure how interchangeable git hub is with bitbucket. But you may want to have a look at the following on how NiFi-registry can be integrated with Github: https://community.cloudera.com/t5/Community-Articles/Storing-Apache-NiFi-Versioned-Flows-in-a-Git-Repository/ta-p/248713 If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

manishg · ‎09-12-2023

So basically all nodes perform exactly entire task. There is no divide and rule by default. Flow designer has to introduce any such parallelism by herself.

MattWho · ‎09-12-2023

@MmSs NiFi is data agnostic. To NiFi, the content of a FlowFile just bits. To remain data agnostic, NiFi uses what NiFi calls a "FlowFile". A FlowFile consists of two parts, FlowFile Attributes/Metadata (persisted in FlowFile repository and held in JVM heap memory) and FlowFile content (stored in content claims within content repository). This way NiFi core does not need to care or know anything about the format of the data/content. It becomes the responsibility of am individual processor component that needs to read or manipulate the content to understand the bits of content. The NiFi FlowFile metadata simply records in which content claim the bits exist and at what offset within the claim the content starts and number if bits that follow. As a far as directory paths go, these become just additional attributes on a FlowFile and have no bearing on NiFi's persistent storage of the FlowFiles content to the content repository. As far as the unpackContent goes, the processor will process both zip1 and zip2 separately. Unpacked content from zip one is written to a new FlowFile and same hold true for zip2. So if you stop the processor immediately after your UnpackContent processor and send your zip1 and zip2 FlowFiles through, you can list the content on the outbound relationship to inspect them before further processing. You'll be able to view the content and the metadata for each output FlowFile. NiFi does not care if there are multiple FlowFiles with the same filename as NiFi tracks them with unique UUID within NiFi. What you describe as zip1 content (already queued in inbound connection to PutS3Object being corrupted if zip2 is then extracted) is not possible. Run both zip 1 and zip2 through your dataflow with putS3Object stopped and inspect the queued FlowFiles as they exist queued before putS3Object is started. Are queued files on same node in your NiFi cluster? Is your putS3Object using "${filename}" as the object key? What happens if you use "{filename}-${uuid}" instead? My guess is issue is in your putS3Object configuration leading to corruption on write to S3. So your issue seems more likely to be a flow design issue then a processor of NiFi FlowFile handling issue. Sharing all the processors you are using in your dataflow and their configuration may help in pinpointing your design issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

OpenText-Orion · ‎09-08-2023

Thank you @MattWho for the answer and @cjervis for the tags. That did solve my issue but it took me a couple tries. I will be marking yours as the solution but including this note for future readers in case they run into the same thing I did. I had been copy-pasting commands in order to keep notes on my steps and at some point one of the two dashes on the --subjectAlternativeNames had been replaced by a different kind of dash character.

manishg · ‎09-07-2023

So I copied only those nars which we use, and container could launch now. Though I have to remove few nars which were causing issues, like nifi-ssl-context-service-nar-1.10.0.nar. And now existing flows dont have issues with properties which are obsolete in 1.22.0 as 1.10.0 nars are used for those components. Thanks for all the inputs.

Online	Offline
Last Visited	‎07-12-2026 03:00 PM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎07-12-2026 03:00 PM
Posts	3,472
Kudos received	1638

Cloudera Community

Re: ListenNetFlow processor does not decode Cisco ...

Re: Can we detect who did a particular operation i...

Re: How to invoke a url in nifi which is protected...

Re: Retry impacts scheduler

Re: 503 error while copying/versioning big process...

Re: Process only one file at a time

Re: Possible to Use AWS Load Balancers in Front of...

Re: NiFi upgrade 1.9.2 to 1.23.0

Re: 'Records Processed' missing for a processor

Re: Nifi: file not picked from input directory aft...

Re: NiFi Version Control Configuration with Bitbuc...

Re: nifi cluster: durability , aggregation

Re: UnpackContent overwriting data

Re: NiFi Cluster on AWS EC2 With SSL Using tls-too...

Re: Multiple framework NARs discovered. Only one f...