About SAMSAL

SAMSAL · ‎12-08-2023

I know this sounds like batching the problem but after the UpdateRecord can you do JoltTransformRecord to transform "" to null for the target field? here is an example of spec that can do such thing: https://github.com/bazaarvoice/jolt/issues/667

ariel12 · ‎12-07-2023

Hello @SAMSAL , sorry for the late response, my issue is the the data in json is always visible ,i need to insert to the db only the data that was not insert the last run (the task run every 5sec) , so now every 5 sec the task insert all the json data so i have many rows with the same values , my end goal is to every 5 sec to insert only data that is not already in the db from the json sorry for the confusion

SAMSAL · ‎12-06-2023

Hi @Fayza , You need to setup the Content-Type as required by the API. There should be "Request Content-Type" property where you can set the value. Also any custom header values can be added as Dynamic Property. The invokehttp processor should be very flexible to accommodate the different API requirements and request types.

scoutjohn · ‎12-05-2023

@SAMSAL , Thank you. this works.

Alevc · ‎12-04-2023

Hi @SAMSAL. Now it is clear to me why it wasn't working. JOLT has a tricky learning curve, isn't it ? I do appreciate your attention on helping me. Thanks !!!!

ChuckE · ‎12-01-2023

You are awesome @SAMSAL Thanks so much for the great information. This helps me a lot. Thank goodness for this community, because I think the documentation is nearly adequate.

MattWho · ‎12-01-2023

@SAMSAL The managed Authorizer uses the file-access-policy-provider (generates the authorizations.xml if it does no already exist) and then a user-group-provider. In your case that would make most sense to be the ldap-user-group-provider. You may also want to use the Composite-configurable-user-group-provider (configure it with ldap-user-group-provider and file-user-group-provider). Having both a file based provider and ldap provider allows sycning of ldap users and groups form ldap automatically as well as the file provider allowing you to manually add non ldap user/client identities for authorization as well. Non ldap client/user identities might be certifcate based clients like other NiFi nodes/instance, etc.. Within the file-access-policy-provider you define the initial admin identity. That user identity could be set to your ldap user account identity. Then on first start up with managed provider, it generates the authorizations.xml file seeded with the policies necessary for that initial admin user identity to act as admin. So you could skip the single-user-provider step. Matt

SAMSAL · ‎11-30-2023

@yan439, Im not sure I understand. I thought you have the schema already defined in the registry with the correct column names and data types. Can you elaborate more on how the avro schema came about and if its the same thing you are using the in the registry?

MattWho · ‎11-29-2023

@Rohit1997jio You could use the RetryFlowFile processor for this use case. You will feed the "failure" relationship via a connection to the RetryFlowFile processor. The RetryFlowfile processor will continue to route the FlowFile back to PublishKafka using the "retry" relationship until maximum number of retries configured has been exceeded. After max retries has been reached the FlowFile would instead route to the "retries_exceeded" relationship which you can connect to a LogMessage processor. The logMessage processor would then auto-terminate the "success" relationship. The challenge you have here is your requirement to retry once per hour for 24 hours. You could set the penalty duration in the PublishKafka to 1 hour. This means that FlowFile routes to the "failure" relationship would get penalized for 60 mins. The RetryFlowFile would not consume that FlowFile from input connection until penalty duration ended. Then configure your number of retries in the RetryFlowFile processor to 24. Be careful with setting queue size to 250 on the failure connection. If you reach 250 queued on the failure relationship, it will trigger backpressure on the PublishKafka processor meaning the publishKafka processor would not get scheduled again until that backpressure is gone. If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

glad1 · ‎11-22-2023

^ I've attached the image above. this is how the data looks. I want to clean the first 7 rows and let the 8th row (header row) be first.

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: Apache NiFi working with nulls/empty in date/...

Re: PutSQL time format issue

Re: How to read all databases and its records from...

Re: Creating a value in json only if the key exist...

Re: Jolt Transformation returning null values

Re: Setting Initial Value of Stateful Variables in...

Re: Nifi 2.0.0 M1 Installation & Running Issues

Re: Put data from Parquet files into DynamoDB with...

Re: how to print log message after a flow file is ...

Re: Remove first few lines in a text/csv flowfile ...