About SAMSAL

SAMSAL · ‎10-10-2023

I dont think you can use the same jolt spec for two different schema. Jolt spec doesnt allow you to use the same key multiple times. You have to know beforehand what schema are you getting then direct to different jolt transformation processor , or store the spec dynamically in an attribute to pass it to single jolt processor

SAMSAL · ‎10-03-2023

@Kiranq, What I found is that the JoltTransformRecord expects only single record to work with hence the name. I noticed when I try to pass an array I was getting the error "...error transforming the first record", however if I pass just one json record it works. If you have an array of json\csv and you are looking to split and process each record individually then I would suggest that you split the records before the JoltTransformRecord. If you dont want to split the array then I recommend using JoltTranformJson first and then use Convert Record processor to convert to CSV.

SAMSAL · ‎10-01-2023

@piyush7829, Change the setting of the EvaluateJsonPath "Destination" Property to flowfile-attribute instead of flowfile-content. Setting this property with the later value will change the whole flowfile content and wont create the attribute you are looking for in the replacetext processor. if that still doesnt work, make sure to check after the EvaluateJsonPath to see if the "match_id" attribute was created and populated with the correct value, if not then I need to see the response from the GetMongo processor to make sure the correct path is being set in the EvaluateJsonPath.

MattWho · ‎09-28-2023

@sarithe You may also want to take a look at Process Group (PG) FlowFile Concurrency configuration options as a possible design path since there does not appear to be any dependency between task 1 and task 2 in your description. You just want to make sure that not more than 2 tasks are executing concurrently. You move your processors that handle the 2 task executions inside two different child PGs configured with "Single FlowFile per Node" Process Group FlowFile Concurrency. Within the PG you create an input port and output port. Between these two ports you handle your task dataflow. Outside this PG (parent PG level), you handle the triggering FlowFiles. The task PGs will allow 1 FlowFile at a time to enter that PG and because of the FlowFile Concurrency setting, not allow any more FlowFiles to enter this PG until that FlowFile processes out. As you can see from above example, each task PG is only processing a single FlowFile at a time. I built this example so that task 2 always takes longer, so you see that task 1 Pg is outputting more FlowFile processed the Task 2 PG while still making sure that on two tasks are ever being executed concurrently. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Abhiram-4455 · ‎09-28-2023

Thank you so much SAMSAL it's working

MattWho · ‎09-18-2023

@manishg The ListFile does not pickup any files. It simply generates a zero content NiFI FlowFile for each file found in the target directory. That FlowFile only has metadata about the target content. The FetchFile processor utilizes that metadata to fetch that actual content and add it to the FlowFile. The value added here happens when you have a lot target files to ingest. To avoid having all the disk I/o related to that content on one node, you can redistribute the zero byte FlowFiles across all nodes so that each node now in a distributed way fetches the content (This works assuming that same target directory is mounted on all NiFi cluster nodes). As @SAMSAL shared you could use Process Group (PG) FlowFile concurrency to accomplish the processing of one FlowFile at a time. The ListFile will still continue to list all FlowFiles in target directory (writes state and continues to list new files as they get added to input directory). You can then feed the outbound connection of your ListFile to a PG configured with "Single FlowFile Per Node" FlowFile concurrency. This will prevent any other FlowFile queued between ListFile and the PG to enter the PG until the first FlowFile has processed through that PG. So your first processor inside the PG would be your FetchFile processor. Now if you were to configure Load Balanced Connection on that connection between ListFile and the PG, You would end up with each node in your NiFi cluster processing a single File at a time. This gives you some concurrency if you want it. However, if you have a strict one file at a time, you would not configure load balanced connection. Hope this helps, Matt

SAMSAL · ‎09-16-2023

Hi @mr80132 , Something I noticed about your configuration for the "GenerateTableFetch" processor is that you are not setting any value for the "Maximum-value Columns". I think you need to set at least one column name that the processor will track the max value for and fetch anything that comes with value greater than the max. Please refer to the processor description: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.GenerateTableFetch/ if that helps please accept solution . Thanks

CommanderLaus · ‎09-13-2023

Hi @SAMSAL , that's pretty simple, but works fine for me. Thanks for the advice. Regards Maik

MattWho · ‎09-12-2023

@MmSs NiFi is data agnostic. To NiFi, the content of a FlowFile just bits. To remain data agnostic, NiFi uses what NiFi calls a "FlowFile". A FlowFile consists of two parts, FlowFile Attributes/Metadata (persisted in FlowFile repository and held in JVM heap memory) and FlowFile content (stored in content claims within content repository). This way NiFi core does not need to care or know anything about the format of the data/content. It becomes the responsibility of am individual processor component that needs to read or manipulate the content to understand the bits of content. The NiFi FlowFile metadata simply records in which content claim the bits exist and at what offset within the claim the content starts and number if bits that follow. As a far as directory paths go, these become just additional attributes on a FlowFile and have no bearing on NiFi's persistent storage of the FlowFiles content to the content repository. As far as the unpackContent goes, the processor will process both zip1 and zip2 separately. Unpacked content from zip one is written to a new FlowFile and same hold true for zip2. So if you stop the processor immediately after your UnpackContent processor and send your zip1 and zip2 FlowFiles through, you can list the content on the outbound relationship to inspect them before further processing. You'll be able to view the content and the metadata for each output FlowFile. NiFi does not care if there are multiple FlowFiles with the same filename as NiFi tracks them with unique UUID within NiFi. What you describe as zip1 content (already queued in inbound connection to PutS3Object being corrupted if zip2 is then extracted) is not possible. Run both zip 1 and zip2 through your dataflow with putS3Object stopped and inspect the queued FlowFiles as they exist queued before putS3Object is started. Are queued files on same node in your NiFi cluster? Is your putS3Object using "${filename}" as the object key? What happens if you use "{filename}-${uuid}" instead? My guess is issue is in your putS3Object configuration leading to corruption on write to S3. So your issue seems more likely to be a flow design issue then a processor of NiFi FlowFile handling issue. Sharing all the processors you are using in your dataflow and their configuration may help in pinpointing your design issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

manishg · ‎09-10-2023

@SAMSAL I experimented same template with nifi 1.10.0, and found that FetchXMLFile has no issues with execution node as PRIMARY. It seems this new requirement mentioned by you was introduced only after 1.10.0.

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: add outside value to array using jolt

Re: convert json to csv in nifi

Re: How to send an http POST application/x-www-for...

Re: Apache Nifi Control rate processor Issue

Re: I have to convert csv file to nested json can ...

Re: Process only one file at a time

Re: GenerateTableFetch returns no results

Re: Turn Text into Attributes

Re: UnpackContent overwriting data

Re: 'Execution node' is invalid because processors...