About SAMSAL

MattWho · ‎09-12-2023

@MmSs NiFi is data agnostic. To NiFi, the content of a FlowFile just bits. To remain data agnostic, NiFi uses what NiFi calls a "FlowFile". A FlowFile consists of two parts, FlowFile Attributes/Metadata (persisted in FlowFile repository and held in JVM heap memory) and FlowFile content (stored in content claims within content repository). This way NiFi core does not need to care or know anything about the format of the data/content. It becomes the responsibility of am individual processor component that needs to read or manipulate the content to understand the bits of content. The NiFi FlowFile metadata simply records in which content claim the bits exist and at what offset within the claim the content starts and number if bits that follow. As a far as directory paths go, these become just additional attributes on a FlowFile and have no bearing on NiFi's persistent storage of the FlowFiles content to the content repository. As far as the unpackContent goes, the processor will process both zip1 and zip2 separately. Unpacked content from zip one is written to a new FlowFile and same hold true for zip2. So if you stop the processor immediately after your UnpackContent processor and send your zip1 and zip2 FlowFiles through, you can list the content on the outbound relationship to inspect them before further processing. You'll be able to view the content and the metadata for each output FlowFile. NiFi does not care if there are multiple FlowFiles with the same filename as NiFi tracks them with unique UUID within NiFi. What you describe as zip1 content (already queued in inbound connection to PutS3Object being corrupted if zip2 is then extracted) is not possible. Run both zip 1 and zip2 through your dataflow with putS3Object stopped and inspect the queued FlowFiles as they exist queued before putS3Object is started. Are queued files on same node in your NiFi cluster? Is your putS3Object using "${filename}" as the object key? What happens if you use "{filename}-${uuid}" instead? My guess is issue is in your putS3Object configuration leading to corruption on write to S3. So your issue seems more likely to be a flow design issue then a processor of NiFi FlowFile handling issue. Sharing all the processors you are using in your dataflow and their configuration may help in pinpointing your design issue. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

manishg · ‎09-10-2023

@SAMSAL I experimented same template with nifi 1.10.0, and found that FetchXMLFile has no issues with execution node as PRIMARY. It seems this new requirement mentioned by you was introduced only after 1.10.0.

scoutjohn · ‎09-03-2023

Update : This is working as I hoped for With this configuration the FileProcessor group will take the next flowfile only after completely processing the flowfile that is inside the group. Thank you @SAMSAL, @pvillard

SAMSAL · ‎08-31-2023

If you are getting multiple records in one Json array , then you probably need to use SplitJosn Processor to get each record individually , then extract the values you need using EvaluateJsonPath from each record then do the PutSQL.

code_mnkey · ‎08-30-2023

Each zip contains a json file along with a bunch of various file types. After unzipping, I do some processing on the other files and then I need to match up json with each of the other files contained in from the original zip file to create an elasticsearch document. I tried using a Content Merge based on the fragment attribute, but that is not working 100%. I am out of ideas on how to get this to work.

cotopaul · ‎08-28-2023

@JohnnyRocks, as @steven-matison said, you should avoid linking so many ReplaceText. I am not quite sure I understood your flow exactly, but something tells me that before reaching ReplaceText, something is not properly configured in your NiFi Flow. First of all, when using the classic Java Data Format, MM will always transpose in a two digit month, meaning that month from 1 to 9 will be automatically appended with a leading zero. "dd" will do the same trick but for days. As I see in your post, you said that your CSV reader is configured to read the data as MM/dd/yy, which should be fine, but somehow something is missing here ---> How do you reach the format of dd/MM/yyyy? What I would personally try to do is to convert all those date values in the same format. So instead of all those ReplaceText, I would try to insert an UpdateRecord Processor, where I would define my RecordReader and my RecordWritter with the desired schemas (make sure that your column is type int with logicaly type date). Next, in that processor, I would change the Replacement Value Strategy into "Record Path Value" and I would press on + and add a new property. I would call it "/Launch_Date" (pay attention to the leading slash) and I would assign it the value " format( /Launch_Date, "dd/MM/yyyy", "Europe/Bucharest") " (or any other timezone you require -- if you require your data in UTC, just remove the coma and the timezone).

SAMSAL · ‎08-27-2023

How often are you looking to run the GenerateFetchTable ? If its going to be a batch process then you can set up the a schedule on the top processor using processor config -> Scheduling tab and setting the Run Schedule value. By default this value is set to 0 secs which means its continuously running.

SAMSAL · ‎08-22-2023

Hi @Anderosn , If I understood you correctly then you are trying to duplicate the flowfile so that it can be sent to different processors, is that right? if that is the case then you can easily drag the same relationship multiple times from a given processor, so lets assume in the upstream processor where you are getting the result flowfile is sending this flowfile to the success relationship, then you can drag two success relationship to different downstream processors and process the same content differently in parallel. If that helps please accept solution. Thanks

JamesMillere · ‎08-18-2023

Hi, If you're using Apache NiFi and the token you're trying to capture with the InvokeHTTP processor is too large to be stored as an attribute, you can follow the steps below to work around this limitation: Keep the token in the content of the FlowFile if it's returned by the InvokeHTTP processor. You can use processors like ReplaceText to wrap the token in the header format you need. For instance, if you need the header to be Authorization: Bearer {token}, then you can configure a ReplaceText processor to replace the content (i.e., the token) to match this format.

scoutjohn · ‎08-17-2023

I think this is what I was trying to achieve, pause the execution for some time after processing 1000 flowfiles . Thank you

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: UnpackContent overwriting data

Re: 'Execution node' is invalid because processors...

Re: Wait for a Flowfile to be picked only after th...

Re: Read the flow file of ExecuteStreamCommand

Re: zip content listing into attributes

Re: Multiple ReplaceText Processors

Re: Dynamic Initial Max Value on GenerateTableFetc...

Re: Send flowfile to two different processors in p...

Re: Use InvokeHTTP response body in another Invoke...

Re: How to pause after execution of 1000 flow file...