About manishg

MattWho · ‎01-29-2024

@manishg Same about of flowFiles per second processing after switching to the Volatile repositories? Perhaps having FlowFile and provenance repositories in memory allows for faster processing of FlowFIles resulting in more read and writes to the content_repository which contains the actual content of each FlowFile. If your NiFi should crash or restart you will lose everything in your volatile repositories. The FlowFile repository holds all the FlowFile metadata for the FlowFiles currently being processed through your dataflows. This means data loss in such events. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

MattWho · ‎01-12-2024

@manishg I am not clear on what you are trying to accomplish here. What is the use case? What is your NiFi version? What is your OS? NiFi does not have a "start.sh" script. Are you talking about the "nifi.sh" script. Perhaps there are just some important details I am missing here. also not sure why you would want to change the nifi..web.http.port configuration property in the nifi.properties file to a variable. These properties are all read during startup of NiFi and evaluating NiFi variables is does not happen during NiFi startup. Nor does NiFi support defining NiFi variables in the nifi.properties file. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

joseomjr · ‎10-09-2023

InvokeScriptedProcessor is the closest you'll get to a native (NAR) NiFi processor from my experience. With it, you do NOT need to define all three relationships...if your code handles all possible problems correctly, you could just have "success" or other for your relationship. From what I've seen you do need to have at least one relationship if you're modifying the FlowFile or creating new ones and require session.transfer to send it to that relationship. You don't need to transfer the original...if you read the original FlowFile and create a new one or several new ones you can dispose of the original with session.remove(your_orginal_flow_file)

cotopaul · ‎10-09-2023

Well the only advise I can give you is to write your processor and see what errors you have and come back with them. Nobody can write your processor if only you know your requirements. What I can suggest you though, is to have a look at the following examples, as they might assist you with what you are trying to achieve: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-2/ta-p/249018 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148

manishg · ‎09-27-2023

Got it. Its on Data Provenance dialog box.

MattWho · ‎09-18-2023

@manishg The ListFile does not pickup any files. It simply generates a zero content NiFI FlowFile for each file found in the target directory. That FlowFile only has metadata about the target content. The FetchFile processor utilizes that metadata to fetch that actual content and add it to the FlowFile. The value added here happens when you have a lot target files to ingest. To avoid having all the disk I/o related to that content on one node, you can redistribute the zero byte FlowFiles across all nodes so that each node now in a distributed way fetches the content (This works assuming that same target directory is mounted on all NiFi cluster nodes). As @SAMSAL shared you could use Process Group (PG) FlowFile concurrency to accomplish the processing of one FlowFile at a time. The ListFile will still continue to list all FlowFiles in target directory (writes state and continues to list new files as they get added to input directory). You can then feed the outbound connection of your ListFile to a PG configured with "Single FlowFile Per Node" FlowFile concurrency. This will prevent any other FlowFile queued between ListFile and the PG to enter the PG until the first FlowFile has processed through that PG. So your first processor inside the PG would be your FetchFile processor. Now if you were to configure Load Balanced Connection on that connection between ListFile and the PG, You would end up with each node in your NiFi cluster processing a single File at a time. This gives you some concurrency if you want it. However, if you have a strict one file at a time, you would not configure load balanced connection. Hope this helps, Matt

MattWho · ‎09-15-2023

@manishg The Record Reader and record Writer controller services are not responsible for tracking counts. This is handled within your custom processors code. I am not sure what your custom processor does and whether it makes sense to track "record processed" or some other record based stat, but you can look at the github code for other processors like PartitionRecord to see how RecordCount is handled. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

manishg · ‎09-14-2023

Its ListFile.

manishg · ‎09-12-2023

So basically all nodes perform exactly entire task. There is no divide and rule by default. Flow designer has to introduce any such parallelism by herself.

manishg · ‎09-10-2023

@SAMSAL I experimented same template with nifi 1.10.0, and found that FetchXMLFile has no issues with execution node as PRIMARY. It seems this new requirement mentioned by you was introduced only after 1.10.0.

Online	Offline
Last Visited	‎06-10-2024 06:50 AM

Member Since	‎07-27-2023 04:53 AM
Last Visited	‎06-10-2024 06:50 AM
Posts	55
Kudos received	19

Cloudera Community

Re: Nifi: file not picked from input directory aft...

Re: disk io operations going up with volatile repo...

Re: exposing a nifi property as env variable

Re: Query on NiFi relationships

Re: Creating a new Record from input record

Re: Performance diff between single big file vs mu...

Re: Process only one file at a time

Re: 'Records Processed' missing for a processor

Re: Nifi: file not picked from input directory aft...

Re: nifi cluster: durability , aggregation

Re: 'Execution node' is invalid because processors...