About SAMSAL

SAMSAL · ‎12-20-2024

Hi, Is the host name defined in the DNS? If not then use the IP address or machine FQDN instead.

SAMSAL · ‎12-20-2024

Hi @phadkev , Im glad its working for you. Something I forgot to mention is that this will perform OK if your CSV dataset is small in size. Im not sure how inferring schema works but I imagine it does full scan on data to determine the appropriate type and if this the case this operation will be costly with large data. That is why its always recommend to know your schema and pass it when working with records. Having said this , I was concerned about the performance of my suggestion above so I did more research and found that actually there is an easier way to convert everything to string in much faster way where you don't need the extra processors. The suggestion is based on this post. Basically, all you have to do in the CSVReader is to set the Access Schema Strategy to "Use String Fields From Header". The CSVRecordWriter Access Schema Strategy should be set to "Inherit Record Schema" That should do it. give it try and see how it goes. Thanks

SAMSAL · ‎12-19-2024

Hi @Velankanni , I got to say that this is one of the most complex jolt I have ever written. Im starting to question if its even worth doing it because it will make maintaining the the spec very hard. I would urge you to look at this post which has similar request of flattening complex json and there I recommended to use JSTL transformation spec instead as it would simplify things by a lot. As far as the jolt spec , I was able to do it in 4 transformation: 1- Start form the leaf "labelingCluster" and work your way up to collect common data for each node and assign arrays for the values of metadata & parent identifiers. 2- traverse the metadata array from step 1 to replicate the values from above against each metadata array item. 3- traverse the parent identifier array from step 1 to replicate the values from step2 against each parent identifier array item. 4- Bucket result from step3 into single array It important to maintain parent\sub array index position during each transformation to help you group fields correctly. [ { "operation": "shift", "spec": { "data": { "getItemListing": { "edges": { "*": { "node": { "treatmentClusterIds": { "*": { "element": { "labelingClusters": { "*": { "labelingCluster": { "@(3,name)": "[&8].treatmentClusterIds[&5].labelingClusters[&2].Element", "@(6,identifier)": "[&8].treatmentClusterIds[&5].labelingClusters[&2].identifier", "@": "[&8].treatmentClusterIds[&5].labelingClusters[&2].labelingCluster", "@(4,metadata)": { "*": { "treatmentClusterIDs": "[&10].treatmentClusterIds[&7].labelingClusters[&4].metadata[]" } }, "@(6,parentItems)": { "*": { "parentIdentifier": "[&10].treatmentClusterIds[&7].labelingClusters[&4].parentId[]" } } } } } } } } } } } } } } }, { "operation": "shift", "spec": { "*": { "treatmentClusterIds": { "*": { "labelingClusters": { "*": { "metadata": { "*": { "@(2,identifier)": "[&7].treatmentClusterIds[&5].labelingClusters[&3][&1].identifier", "@": "[&7].treatmentClusterIds[&5].labelingClusters[&3][&1].TreatmentId", "@(2,Element)": "[&7].treatmentClusterIds[&5].labelingClusters[&3][&1].Element", "@(2,labelingCluster)": "[&7].treatmentClusterIds[&5].labelingClusters[&3][&1].labelingCluster", "@(2,parentId)": "[&7].treatmentClusterIds[&5].labelingClusters[&3][&1].parentId" } } } } } } } } } , { "operation": "shift", "spec": { "*": { "treatmentClusterIds": { "*": { "labelingClusters": { "*": { "*": { "parentId": { "*": { "@(2,identifier)": "[&8].treatmentClusterIds[&6].labelingClusters[&4][&3][&1].identifier", "@": "[&8].treatmentClusterIds[&6].labelingClusters[&4][&3][&1].parentIdentifier", "@(2,Element)": "[&8].treatmentClusterIds[&6].labelingClusters[&4][&3][&1].Element", "@(2,TreatmentId)": "[&8].treatmentClusterIds[&6].labelingClusters[&4][&3][&1].TreatmentId", "@(2,labelingCluster)": "[&8].treatmentClusterIds[&6].labelingClusters[&4][&3][&1].labelingCluster" } } } } } } } } } }, { "operation": "shift", "spec": { "*": { "treatmentClusterIds": { "*": { "labelingClusters": { "*": { "*": { "*": "[]" } } } } } } } } ] Again, look into JSTL spec to simplify, or try to break up the json and store into database where you can easily perform sql queries to flatten the data using join If this helps please accept the solution Thanks

SAMSAL · ‎12-19-2024

Hi @Velankanni , Can you explain what are you trying to accomplish with the transformation? I was having a hard time understanding how you came up with output given that you are pulling fields from different arrays with different cardinality.

SAMSAL · ‎12-19-2024

Hi @phadkev , Since you dont care if all the values come as string , then I would suggest doing the following: 1- Use the ExtractRecordSchema processor to generate the avro.schema attribute with the record schema as inferred by Nifi (Available 1.26+ version ) as follows: The CSVReader for this processor will use infer schema strategy. once you pass your CSV input through this processor you will have new flowfile attribute avro.schema with the following value As Expected some of the values like name are assigned int type which we will take care of in the next step. 2- Use UpdateAttribute to replace any int type with string type inside the avro.schema attribute as follows: The Expression Language used to re set the avro.schme attribute: ${avro.schema:replace("int","string")} 3- Use the QueryRecord with a different CSVReader from step 1 where this one uses the "use Schema Text Property" . Notice how by default the Shema Text property is set to the avro.schema attribute, which we generated from step 1&2: Also make sure you set the same strategy for the CSVRecordWriter to ensure that the read and written CSV will be in the desired format. Hope that helps. If it does, please accept the solution. Thanks

SAMSAL · ‎12-18-2024

Hi @phadkev , I'm not sure if there is a way around this without using Schema. You mentioned that you want to read the CSV files dynamically but does that mean you dont know what kind of CSV data you will get ? if so how are you parsing the data after conversion? what processor are you using the CSVReader & Write in ? maybe you can elaborate more on what you are trying to do with the data beginning to end to see if we can help in another way.

SAMSAL · ‎12-17-2024

Hi @Sid17 , The following spec should do: [ { "operation": "shift", "spec": { "data": { "getCarListing": { "edges": { "*": { "node": { "carClusterIds": { "*": { "element": "[]" } } } } } } } } } , { "operation": "shift", "spec": { "*": { "businessRelations": { "*": { "countries": { "*": { "countryCode": "[&5].[&3].[&1].CountryCode", "@(2,code)": "[&5].[&3].[&1].BR", "@(4,name)": "[&5].[&3].[&1].Element" } } } } } } }, { "operation": "shift", "spec": { "*": { "*": { "*": "[]" } } } } ] The concept is the same as before: You need to group fields at level of the leaf field taking into consideration the parent array and any sub arrays positions for grouping. The first transformation is used just to simplify the grouping (second transformation ) by removing any parent array where no information is needed but you can have both 1st and 2ed transformation as one if you can count up to the right array\sub-array index. As I mentioned before, the recommendation for such pattern is to use JSLT transformation as its much simpler to write and require much less lines to achieve the same result. Also another option that might simplify the spec as well by doing the following: 1- Take advantage of the FlattenJson processor with the following configuration: This processor will flatten all nested objects as one big object using (.) as separator as configured and will included any arrays as well. The result of this processor after passing the input json will look like this: { "data.getCarListing.edges.0.node.id": "1001", "data.getCarListing.edges.0.node.identifier": 500001, "data.getCarListing.edges.0.node.carClusterIds.0.element.name": "SEDAN1000", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.0.countries.0.countryCode": "US", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.0.code": "US01", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.1.countries.0.countryCode": "CZ", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.1.countries.1.countryCode": "SK", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.1.code": "CZ01", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.2.countries.0.countryCode": "CA", "data.getCarListing.edges.0.node.carClusterIds.0.element.businessRelations.2.code": "CA01", "data.getCarListing.edges.0.node.carClusterIds.0.metadata.0.name": "clusterId", "data.getCarListing.edges.0.node.carClusterIds.0.metadata.0.value": "200011", "data.getCarListing.edges.0.node.carClusterIds.1.element.name": "SUV2020", "data.getCarListing.edges.0.node.carClusterIds.1.element.businessRelations.0.countries.0.countryCode": "MX", "data.getCarListing.edges.0.node.carClusterIds.1.element.businessRelations.0.code": "MX01", "data.getCarListing.edges.0.node.carClusterIds.1.metadata.0.name": "clusterId", "data.getCarListing.edges.0.node.carClusterIds.1.metadata.0.value": "200012", "data.getCarListing.edges.0.node.parentItems.0.identifier": 400050, "data.getCarListing.edges.1.node.id": "1002", "data.getCarListing.edges.1.node.identifier": 500002, "data.getCarListing.edges.1.node.carClusterIds.0.element.name": "TRUCK500", "data.getCarListing.edges.1.node.carClusterIds.0.element.businessRelations.0.countries.0.countryCode": "FR", "data.getCarListing.edges.1.node.carClusterIds.0.element.businessRelations.0.code": "FR01", "data.getCarListing.edges.1.node.carClusterIds.0.element.businessRelations.1.countries.0.countryCode": "DE", "data.getCarListing.edges.1.node.carClusterIds.0.element.businessRelations.1.code": "DE01", "data.getCarListing.edges.1.node.carClusterIds.0.metadata.0.name": "clusterId", "data.getCarListing.edges.1.node.carClusterIds.0.metadata.0.value": "200021", "data.getCarListing.edges.1.node.carClusterIds.1.element.name": "COUPE3000", "data.getCarListing.edges.1.node.carClusterIds.1.element.businessRelations.0.countries.0.countryCode": "JP", "data.getCarListing.edges.1.node.carClusterIds.1.element.businessRelations.0.code": "JP01", "data.getCarListing.edges.1.node.carClusterIds.1.metadata.0.name": "clusterId", "data.getCarListing.edges.1.node.carClusterIds.1.metadata.0.value": "200022", "data.getCarListing.edges.1.node.parentItems.0.identifier": 400051 } Then , understanding how you can reference part of the fields in Jolt to get the index will help you group things accordingly to then produce the final result: [ { "operation": "shift", "spec": { "data.getCarListing.edges.*.node.carClusterIds.*.element.businessRelations.*.countries.*.countryCode": { "@(1,data\\.getCarListing\\.edges\\.&(0,1)\\.node\\.carClusterIds\\.&(0,2)\\.element\\.businessRelations\\.&(0,3)\\.code)": "&(1,1).&(1,2).&(1,3).&(1,4).BR", "@(1,data\\.getCarListing\\.edges\\.&(0,1)\\.node\\.carClusterIds\\.&(0,2)\\.element\\.name)": "&(1,1).&(1,2).&(1,3).&(1,4).Element", "@": "&(1,1).&(1,2).&(1,3).&(1,4).countryCode" } } }, { "operation": "shift", "spec": { "*": { "*": { "*": { "*": "[]" } } } } } ] This will be simpler in the sense that you count horizontally vs vertically to get the index. Try it out and see how it works. This will help you understand jolt more and give you more options for future transformation. Hope that helps.

SAMSAL · ‎12-16-2024

Hi @rajivswe_2k7 , Is it possible that you are not specifying the path In "Module Directory" property of the ExecuteScript Processor where all required dependencies can be located? Also my understanding that the script in such processor is evaluated on each flowfile so if you are running complex code that requires a lot of external dependencies this could be costly. A cleaner and more efficient way is to create custom processor where all the code and dependencies gets packaged as NAR file, or if you are using version 2.0 you can create custom processor using python extensions if the same thing can be implemented in python. Hope that helps.

SAMSAL · ‎12-12-2024

So I took the cod you provided and created the python file and placed it unde: .\python\extensions After that I launched nifi (Im still on 2.0.0 M4) , then I went and select the WriteHelloWorld Processor and initially without chaning or adding anything else I was getting the same warning messages you have posted in your last message, however once I added the upstream processor because it needs it , then I started to get different warning messages about not connecting success, failure & original relationships which indicates that the processor has loaded successfully. Once I terminated the failure & original relationships the warning disappeared and I can run the flow to generate the Hello World message: The flowfile in the success rel has the following output: The following files and directories have been generated under .\work\python\extensions: \WriteHelloWorld\0.0.1-SNAPSHOT Under 0.0.1-SNAPSHOT folder: Try to create the upstream processor. I know it should not make sense but maybe it needs it to complete the setup. Hope that helps but at least it shows that there is nothing wrong with your code and its just matter of setup and configuration against your environment.

SAMSAL · ‎12-11-2024

What gets created is ".\work\python\extensions\[CustomProcessor]\2.0.0-SNAPSHOT". it should have all downloaded dependencies as specified in your processor code, then venv files and folder, in case of windows it creates Script folder where all the venv files: I think in Unix it store those files under Bin folder but it should not matters. where you seeing any error messages or warning initially before manually creating the folder? Also one thing you can try is use later python version than 3.9 , I know in my case 3.11 worked well for me.

Online	Offline
Last Visited	‎05-08-2025 03:43 AM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎05-08-2025 03:43 AM
Posts	574
Kudos received	323

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: NiFi ERR_CONNECTION_REFUSED from other compute...

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: Nifi 2.0 'Processor' is invalid because initia...

Re: Nifi 2.0 'Processor' is invalid because initia...