About SAMSAL

SAMSAL · ‎01-18-2023

Hi, I was able to obtain the required result using the following processor: 1- SplitText : this is to help you split each json record into its own flowfile 2- UpdateRecord: This is used to update the dates fields and convert to the required format using Json Record Reader\Writer: The value used to convert the time for each field : ${field.value:toDate("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"):format("yyyy-MM-dd HH:mm:ss.SSS")} More info on UpdateRecord: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.7.1/org.apache.nifi.processors.standard.UpdateRecord/additionalDetails.html Note: The only problem I noticed is that null values will be converted to "" . Not sure if that will cause you a problem but you can use replace text or json jolt to convert the values back to null. If you need the records to be merged back together before inserting into Hive, you can use MergeRecord processor. If that helps please accept solution. Thanks

hkh · ‎01-18-2023

Thank you Matt.

SAMSAL · ‎01-17-2023

Hi, Not sure if you are looking for the exact thing but this should give you the expected output from the sample you provided: [ // Flatten an array of photo objects into a prefixed // soup of properties. { "operation": "shift", "spec": { "content": { "*": { "*": { "*": { "$": "error", "$1": "product", "$2": "ErrorType", "@": "q" } } } } } } ] If that helps, please accept solution. Thanks

SAMSAL · ‎01-13-2023

I was finally able to figure out the problem. To resolve this issue basically it seems like the py\jar file as specified in the "appResource" & ""spark.jars" needs to be accessible by all nodes in the cluster, for example if you have network path you can specify the network path in both attributes as follows: "appResource": "file:////Servername/somefolder/HelloWorld.jar", ... "spark.jars": "file:////Servername/someFolder/HelloWorld.jar", Note sure why if the job is being submitted to the master. If anybody knows please help me understand.

VidyaSargur · ‎12-15-2022

@Bello as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

Mars001 · ‎12-03-2022

I am dealing with Kafka dataset where there are multiple types of message data is processing (coming) Sample data: eventType 1- { "type": "record", "name": "Dispatch_Accepted", "namespace": "accepted.avro", "fields": [ { "name": "John", "type": "string", "doc": "Name of the user account" }, { "name": "email", "type": "string", "doc": "The email of the user logging message on the blog" }, { "name": "timestamp", "type": "long", "doc": "time in seconds" } ], "doc:": "A basic schema of Dispatch_Rejected" } EventType-2 { "type": "record", "name": "Dispatch_Rejected", "namespace": "rejected.avro", "fields": [ { "name": "Merry", "type": "string", "doc": "Name of the user" }, { "name": "email", "type": "string", "doc": "The email of the user logging message on the blog" }, { "name": "timestamp", "type": "long", "doc": "time in seconds" } ], "doc:": "A basic schema Rejected data" } Schema of the data getting validated from Confluent Schema Regisry (Working Fine), I need to apply filter on Schema name (Dispatch_Rejected and Dispatch_Accepted) and crete two separate data files for each so I am using QueryRecord Processor which below query <Dispatch_Rejected>=Select * from FLOWFILE WHERE name='Dispatch_Rejected' <Dispatch_Accepted>=Select * from FLOWFILE WHERE name='Dispatch_Accepted' This is not working.. can't identify the schema name. Controller service is working fine. 1- How I can pick the schema name from Controller service 2- Should I need to assign the value ${schema.name} in another variable <My_schema> and need to write SELECT Statement like <Dispatch_Rejected>=Select * from FLOWFILE WHERE My_Schema.name='Dispatch_Rejected' <Dispatch_Accepted>=Select * from FLOWFILE WHERE My_Schema.name='Dispatch_Accepted' Summary-- I want to filter the data based on eventType, and create separate data files Please help

Fredi · ‎12-01-2022

Hi, thanks for the details. Unfortunately it is not working. I get an empty array [] as output. I have tried it with extract and split mode. I applied the schema text property as suggested with "NestedKey" and "nestedValue" as name. None gives me an output. Meanwhile I have achieved what I wanted using SplitContent and then again another jolt processor. Of course it would be more elegant if I could make it work with ForkRecord.

SAMSAL · ‎11-29-2022

Hi , I think after you split your csv you need to extract the values of both columns: status and client_id to attributes and then use in the ExecuteSQL processor, for that you need to : 1- convert the record to from CSV to JSON format using ConvertRecord Processor 2- use EvaluateJsonPath to extract both columns into defined attribute (dynamic properties). Make sure to set the Destination property to "flowfile-attribute". After that you can reference those attribute in the SQL query as ${status} & ${client_id}, assuming thats how you called the attributes in step 2. Another option if you dont want to use two processor , you can use ExtractText processor and provide regex to extract each value but you have to be careful how you define your regex for each value to make sure you are only pulling those values and nothing else. Hope that helps. If that answers your question please accept solution. Thanks

Techie123 · ‎11-28-2022

@SAMSAL Thank you for your help.

MattWho · ‎11-28-2022

@Mohamed_Shaaban I recommend starting a new community question with the details specific to your setup. This allows the community to address/assist with your specific setup versus comparing your issue to what was shared in this post. Thanks, Matt

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: Change timestamp format field in nifi

Re: Nifi Credentials

Re: Broke nested JSON array with JOLT

Re: Apache Spark Submit using rest API driver stat...

Re: Nifi InvokeHTTP to get a token from an API

Re: Extract a column of file and create separate f...

Re: SplitJson for nested json content

Re: Useing a CSV file content in SQL condition

Re: How to Merge the original data with API respon...

Re: Unknown user with identity 'CN=nifi_admin, OU=...