About SAMSAL

SAMSAL · ‎08-07-2024

Hi @Fredi , Its hard to say what is happening without looking at the data where optionalDict is not empty. You only provided data when its empty. Keep in mind that this is not true Python its actually flavor of it called Jython so its not apple to apple when comparing to python. If I can suggest two alternatives: 1 - Since Jython script is going to be deprecated starting from version 2.0 , then I would recommend using groovy instead . Actually parsing json in groovy is much simpler than Jython. Im not sure what version you are using but there is a dedicated processor for executing groovy script called ExecuteGroovyScript that is probably faster than running traditional ExecuteScritp which you can still use it. The script looks like this based on your input : import org.apache.commons.io.IOUtils import java.nio.charset.StandardCharsets import groovy.json.JsonSlurper import groovy.json.JsonOutput flowFile = session.get() if(!flowFile) return def text = '' // Cast a closure with an inputStream parameter to InputStreamCallback session.read(flowFile, {inputStream -> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) } as InputStreamCallback) def jsonSlurper = new JsonSlurper() def jsonData = jsonSlurper.parseText(text) if(jsonData.directories[0]) { session.remove(flowFile) jsonData.directories.each { d -> newflowfile = session.create() newflowfile = session.write(newflowfile, {inputStream, outputStream -> outputStream.write(JsonOutput.toJson(d).getBytes(StandardCharsets.UTF_8)) } as StreamCallback) newflowfile = session.putAttribute(newflowfile, "setId", jsonData.setId.toString()) newflowfile = session.putAttribute(newflowfile, "setName", jsonData.setName) newflowfile = session.putAttribute(newflowfile, "absolute.path", d.path) if(jsonData.optionalDict) { newflowfile = session.putAttribute(newflowfile, "value1", jsonData.optionalDict.set_entity_relation.intValue.toString()) newflowfile = session.putAttribute(newflowfile, "value2", jsonData.optionalDict.set_entity_relation.stringValue) } session.transfer(newflowfile, REL_SUCCESS) } } else session.transfer(flowfile, REL_FAILURE) I have tried the script for both scenarios and it worked as expected. 2- The other alternative is to use other processors (nifi way) to achieve what you want without executing script (not the nifi way) . The execute processor should be left as last option incase the out of the box processors dont suffice or you looking to improve performance in case the flow gets very complicated and inefficient. For this I would use the following processors: 1- JsonEvaluatePath to extract common attribues: setId, SetName, optionalDict.value 1 & 2..etc. 2-Do JsonSplit or QueryRecords on the directories object: this will produce different flowfiles and each flowfile will have the common attribute. 3- JsonEvaluatePath to extract each directory attributes even though its already part of the flowfile content. Hopefully that helps. If it does please accept the solution. Thanks

SAMSAL · ‎08-07-2024

Hi, You can do this in different ways : 1- Jolt: Squash Null [ { "operation": "shift", "spec": { "*": "temp.&" } } , { "operation": "modify-overwrite-beta", "spec": { "temp": "=squashNulls" } } , { "operation": "shift", "spec": { "temp": { "*": "&" } } } ] 2- JOLT: using * on the value which will ignore nulls by default. [ { "operation": "shift", "spec": { "*": { "*": { "$": "&2" } } } } ] 3- JSLT: I like to introduce people to JSTL which is another transformation language supported by Nifi if you are using version 1.16 and above using JSLTTransformJson. Jstl can simplify transformation in some scenarios over jolt. For example in this case the transformation will be as simple as this: {for(.) .key:.value} The default filtering applied will take care of this: If you find this helpful please accept solution. Thanks

SAMSAL · ‎08-02-2024

Hi , I haven been playing with nifi on docker lately and its been quite the challenge and the learning experience. To best understand how to utilize docker for nifi, Im hoping the community can help me with addressing the following observations\questions: 1- Most of the examples I found on the internet including the nifi official docker page seem to be suitable for single host deployment ! Im finding this is strange - unless Im missing something - but doesnt that defeat the purpose of having cluster with no single point of failure ? What are the the scenarios where someone wants to deploy single host multiple container cluster vs multiple host single container ? 2. Getting to understand docker networking I found that if I want to create multi host cluster and have the cluster to have visibility to our work network then the ideal way to do it is using "host" networking, is this correct or is there a better way (maybe using overlay networking with swarm? if I do that later then how Im going to access none docker servers on my network? 3. If "host" networking is one of the options, then why the official nifi docker image doent mention how to see the https host name as one of the environment propeties similar to what we do locally by setting "nifi.web.https.host" in the nifi.properties ? using other sites\images I found the property "NIFI_WEB_HTTPS_HOST" can be used fort that which works ! Is there another way of setting the host? 4. Initially I was trying to use embedded zookeeper setup but I found that it doesnt work no matter how hard I tried. I found a lot people recommending using external zookeeper which what I ended up doing. Actually it turns out there is Jira bug for the problem I was faciing but its not resolved despite its been open for couple years! Why is that and is it ever going to be fixed or the recommendation is to use external zookeeper? if so at least that should have been mentioned somewhere. 5. Are the environment variables listed in the official docker page cover everything or there is more? where we can find comprehensive list of all the environment properties? I can see for example this image seem to list more env properties. 6. This is really important because I struggled the most with: How do we go about setting the nodes identity so that they are included in the authorizers.xml file? I could not find any clear instruction on this and I was getting the "Untrusted Proxy ". The only way I was able to get it to work is to manually update this file (using docker cp) but I had to also delete the generated users.xml and authorizations.xml files while the container is running because it seems you cant do it while container is stopped. I dont think this is the proper way of doing it and I hope there is better way that can be done in the yml file itself. I really appreciate the community feedback on this specially from the expert like @MattWho , @steven-matison, @pvillard Thanks

SAMSAL · ‎07-27-2024

Hi @PradNiFi1236 , Regarding the first question, similar to FetchFile the FetchSmb has a completion strategy property where you can tell what do what the file after the fetch: To delete you can simply select Delete File. For the second question, Im not sure as I have never used it but you can try it and see if it works. Hope that helps.

SAMSAL · ‎07-26-2024

Hi, If you search the internet you might find something that would help in your case. Please refer to this this: https://stackoverflow.com/questions/37530121/putfile-append-file Here is my suggestion if you dont want to write\use some custom code. Basically, if you think of the New Content (GenerateFlowFile) as the new data that you got , the idea is to do a Merge regardless, the trick is when you try to Fetch the file if it exist you will end up merging the content of the old to the new, and if it doesnt exist (not.found relationship) you will merge new content to empty string (using ReplaceText from not.found rel) . However there are few caveat with this design that you might need to address: 1- Since the Merge Content uses Text Strategy ( see config below) and it uses newline for Demarcator (delimiter). If the file doesnt exist (first time) it will add empty newline towards the end. you can solve this by using replace text or other methods so its not big deal. 2- The merge order is not predictable so you might get the new content at the top vs being appended to the bottom. That is because the merge content gets the New Content first before the old content. If the order is important , then you can use something like EnforceOrder Processor where you set an integer attribute for order priority on each content. If the order is not important then you can ignore. 3-You need to preserve the filename since MergeContent will produce new file name , however the MergeContent reads an attribute "segment.original.filename" if it finds it , it will use whatever filename specified there. With that here are the different processor configurations: 1 - GenerateFlowFile (New Content): - filename: simulate having file content with filename (you might have that already) - segment.original.filename : this is used by MergeContent to set the correct filename after merge. 2- FetchFile: Make sure to set the marked properties as such. Also Important, under settings tab make sure to set the Penalty Duration to 0 sec. This is because for some reason when file is not found and the flowfile is directed to not.found relationship it will get penalized! not sure why even when I set the Log Level for Not Found to None. 3- ReplaceText (Replace Orig with Empty String): 4- Merge Content: For the Demarcator (delimiter) use Shift+Enter for the newline character 5- PutFile: Hope that helps. if it does please accept the solution and feel free to add your input in case you were able to make it work for your case so other can benefit. Thanks

SAMSAL · ‎07-25-2024

Hi, Sorry for the delay. See if this will work: [ { "operation": "shift", "spec": { "tiers": { "*": { "priceRuleAttributes": { "*": { "id": { "PR_CURR_CONDITION": { "@(2,values[0])": "paths.[&5].currencyCondition[#2].currency" } } } }, "rewards": { "*": { "changeType": "paths.[&3].calculatePrice[&1].calculationType", "changeAmount": "paths.[&3].calculatePrice[&1].amount", "changePercent": "paths.[&3].calculatePrice[&1].percentage", "@(2,priceRuleAttributes)": { "*": { "id": { "PR_CURR_CONDITION": { "@(2,values[0])": "paths.[&7].calculatePrice[&4].Currency" } } } } } } } } } } ]

SAMSAL · ‎07-25-2024

Hi , Can you post your python processor code? Also what version of Nifi , Java , Python are you you using?

SAMSAL · ‎07-25-2024

Hi @NagendraKumar , Its hard to design something like this here since we dont know the specifics. However there are a lot of processors that can help with file manipulation like : GetFile, ListFile, FetchFile, PutFile. MergeContents . Also not sure if you will encounter concurrency issues where multiple files arrive around the same time. My advise is that you start experimenting with those processor and if you run into specific issue you can post about here and hopefully someone will be able to assist.

SAMSAL · ‎07-25-2024

Hi @cadrian90 , Im not aware of direct way to do that in Nifi. I know there are services\processor like CEFFeader and ParseCEF used to consume CEF format but not to write as CEF. the good news is that you can write your custom code to create service or new processor to do that using Either Python or Java if you happen to know a way of doing using code.

SAMSAL · ‎07-24-2024

Do you mind explaining which processor\processors are you trying to package from that repo and what are the steps taken to package ? also if you can email me an attachment of your .nar package which should be a zip file that will be great.

Online	Offline
Last Visited	‎12-31-2024 03:55 PM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎12-31-2024 03:55 PM
Posts	574
Kudos received	320

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: ExecuteScript python seems to execute differen...

Re: Using JOLT to remove field with null value

Multi Host Nifi Cluster Deployment using Docker

Re: Fetch All files in directory from Share based ...

Re: Append the data to the text file

Re: Copy Field value from one list to another usin...

Re: Building Apache Nifi Python Processors

Re: Append the data to the text file

Re: custom cisco syslog to cef format

Re: New Apache Nifi Release 2.0.0 M3 with Python e...