About Anderosn

ChethanYM · ‎05-01-2024

@Anderosn 1. If the content of your flow file is too large to be inserted into a single CLOB column, you can split it into smaller chunks and insert each chunk into the database separately. 2. Instead of storing the content in a CLOB column, you can consider storing it in a BLOB (Binary Large Object) column in your database. BLOB columns can store binary data, including large files, without the size limitations of CLOB columns. 3. Store the content of the flow file in an external storage system (e.g., HDFS, Amazon S3) and then insert the reference (e.g., file path or URL) into the database. This approach can be useful if the database has limitations on the size of CLOB or BLOB columns 4. If ExecuteScript is not approved, consider using an external script or application to perform the insertion into the database. You can trigger the script or application from NiFi using ExecuteProcess or InvokeHTTP processors Regards, Chethan YM

SAMSAL · ‎02-14-2024

Hi, Have you looked into the EnforceOrder processor : https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.23.2/org.apache.nifi.processors.standard.EnforceOrder/index.html Based on your description it should do the job. One thing to keep in mind if you want this to work is what is stating in the description : "... [IMPORTANT] In order to take effect of EnforceOrder, FirstInFirstOutPrioritizer should be used at EVERY downstream relationship UNTIL the order of FlowFiles physically get FIXED by operation such as MergeContent or being stored to the final destination. " The group identifier will be in this case the value of the fragment.identifier: ${fragment.identifie} The Order Attribute will be the name of the fragment.index: fragment.index If you find this helpful please accept the solution. Thanks

SAMSAL · ‎01-26-2024

Hi @Anderosn , From previous posts it seems this is a common behavior for COLB column data type and it doesnt seem like you can avoid it. To extract the json value please refer to the following post: https://community.cloudera.com/t5/Support-Questions/Avro-to-Json-adding-extra-delemeters/m-p/380646#M244113

Anderosn · ‎01-18-2024

I have a scenario where I need to set the status column in two tables to 'C' to mark the request as completed. Currently I am doing it using two PutSQL processors but I want to do that in a Transactinoal way to keep both tables consistent. UPDATE REQUEST SET STAT_CD = 'C', UPDT_DT= CURRENT_TIMESTAMP WHERE RQST_ID = '${requestId}'

MattWho · ‎01-11-2024

@Anderosn Is your InvokeHTTP processor triggered by a FlowFile from an inbound connection to the processor or does it have no inbound connections and executes purely based on configured run schedule? This is one of very few processors where an inbound connection is optional, but behavior is different dependent on the configuration chosen. With no inbound connection there is no FlowFile to "retry" when you encounter "failure" or "No retry" result from execution. Because really it is retrying every time it executes essentially with no inbound connection. You could use a GenerateFlowFile processor to feed an empty trigger FlowFile to the invokeHTTP processor to trigger its execution. This would then give you a FlowFile that Retry configuration can use. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

SAMSAL · ‎12-12-2023

Can you post screenshot of the UpdateRecord processor configuration? Also you have to be careful with the provided input because there is an extra comma after last Garry value which makes the json invalid.

MattWho · ‎09-19-2023

@Anderosn So MergeContent does just that, merges the content of all FlowFiles being merged. I am not sure how often your GenerateFlowFile processor executes, but when it does execute it will create a FlowFile with a unique filename (unless you set the filename in the GenerateFlowFile processor via a dynamic property). The produced data by the GenerateFlowFile is routed is routed as a FlowFile to one of the success relationships and a clone FlowFile is routed to the other success relationship in your dataflow (both FlowFiles have same "filename" but different flowfile uuids). The "filename" attribute can be used in the MergeContent processor in the "Correlation Attribute Name" property. Then you can set min num entries to "2". This will make sure both FlowFiles with same value in the filename attribute will get allocated to same bin. The MergeContent property "Attribute Strategy" will need to be set to "Keep All Unique Attributes" so that the final merged FlowFile will include the new token attribute. Now we have to deal with the content. What we need to make sure is that the FlowFile used to fetch the token has no content before being routed to mergeContent processor. For that you can use the ModifyBytes processor and set "Remove all content" to "true" after your EvaluateJsonPath processor. Removing the content does not remove the FlowFile metadata/attributes, so this now 0 byte FlowFile will still have its filename value and token attribute with value. ------- Now with above suggestion for your existing dataflow as an option, there are probably many other dataflow designs to accomplish this. Since you are using GenerateFlowFile to create the content needed for your final invokeHTTP, I'd go a different route that does not need a MergeContent processor. GenerateFlowFIle (custom content needed to fetch token) --> InvokeHTTP (get Token) --> EvaluateJsonPath (extract token from content to attribute) --> replaceText ( ("Replacement Strategy"="always replace", "Evaluation mode"="Entire text", "replacement value"=<content needed for your final rest-api call>) --> InvokeHTTP (you final rest-api endpoint request). The above removes need for MergeContent or dealing with multiple paths. You have a single process flow where any failure in along the path does not result in potential of orphaned binned FlowFile at your MergeContent processor. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

SAMSAL · ‎08-22-2023

Hi @Anderosn , If I understood you correctly then you are trying to duplicate the flowfile so that it can be sent to different processors, is that right? if that is the case then you can easily drag the same relationship multiple times from a given processor, so lets assume in the upstream processor where you are getting the result flowfile is sending this flowfile to the success relationship, then you can drag two success relationship to different downstream processors and process the same content differently in parallel. If that helps please accept solution. Thanks

SAMSAL · ‎08-16-2023

Hi, You can try the following spec: [ { "operation": "shift", "spec": { "*": "&", "responseData": { "responseList": { "*": { "individualInfo": { "#${UUID()}": "responseData.responseList.[&2].individualInfo.activityUID", "firstName": "responseData.responseList.[&2].individualInfo.&", "middleName": "responseData.responseList.[&2].individualInfo.&", "lastName": "responseData.responseList.[&2].individualInfo.&", "dateOfBirth": "responseData.responseList.[&2].individualInfo.&" } } } } } } ] If that helps please accept solution. Thanks

MattWho · ‎08-10-2023

@Anderosn In-between your SplitJson and PuSQL processors are you rebalancing the FlowFile across multiple nodes in a NiFi cluster? Are you routing any of the split Json messges down a different dataflow path that does not lead to this pusSQL processor? The reason I ask is because the splitJson processor will write the following FlowFile attributes to each new FlowFile created (each split): The fragment.identifier value and fragment.count are used by the putSQL processor when "Support FragmentTransactions" is set to "true" (default). This means that, if not all split jsons are present at this putSQL and located on the same node of the NiFi cluster, the FlowFiles part of the same fragment.identifier will not be processed and remain on the inbound connection to the PutSQL. I'd start my listing the connection and checking these attributes to verify the fragment.count is "10", the fragment.identifier has same value on all 10, and fragment.index value shows numbers 1 to 10 across those 10 FlowFiles. If making sure all fragments are processed in same transaction is not a requirement for your dataflow, try changing "Support Fragmented Transactions" to false and see if these 10 FlowFiles get successfully executed by your putSQL processor. If you found that the provided solution(s) assisted you with your query, please take a moment to login and click Accept as Solution below each response that helped. Thank you, Matt

Online	Offline
Last Visited	‎02-15-2024 09:28 AM

Member Since	‎07-27-2023 05:39 PM
Last Visited	‎02-15-2024 09:28 AM
Posts	18
Kudos received	1

Cloudera Community

Re: Avro to Json adding extra delemeters

Re: Inserting CLOB into Datbase throws ORA-01704

Re: Need a mechanism to Sort flow files in a Queue...

Re: ExecuteSqlRecord adding escape characters to t...

Run Multiple Update Statements Transitionally

Re: InvokeHttpProcessor - Need to retry 3 times in...

Re: Avro to Json adding extra delemeters

Re: Need Merging Strategy to merge flowfiles as a ...

Re: Send flowfile to two different processors in p...

Re: Jolt Spec to add a key-value pair to every obj...

Re: PutSQL - Not enough FlowFiles for transaction...