About SAMSAL

SAMSAL · ‎08-23-2024

Hi, Welcome to the community. Can you elaborate more on how you want to flatten the input specially when you have multiple arrays with different cardinalities where sometimes you match by index and other (when the array count = 1) you seem to assign to all. Also your sample input seem to have some matching values which makes it ambiguous in figuring which value is being assigned to what array. When I look at the output all I see is two identical objects where all the values seem to match which makes it even more confusing.

SAMSAL · ‎08-21-2024

Hi @Crags , It looks like you were close....maybe not 🙂 , though heading in the right direction ...will kind of still far a way 😞 . The problem looks easy but is it really ?! Welcome to the world jolt , whenever you think you got it, you face problem that makes you re think your understanding 🙂 I dont mean to discourage you or scare you from jolt . I was in your shoes when I first started where I only knew what {"*":"&"} means . However with practice and a lot of problem solving I got much better and right now I enjoy solving jolt problems . You can read as much tutorial online or lookup some examples but I can assure you the only way to learn it is through practice. A good place to look for the latest information is the Jolt github repository. For quick cheat sheet I use this link. Now back to your problem , The challenge is to be able to group extracted attributes from each zone together, for that I use the first shift spec to create each zone attribute and store them under the same zone parent object. Then I use a second shift to bucket each zone object value into the prices array with their product info. The spec will look like this: [ { "operation": "shift", "spec": { "*": { "CODE": "[&1].code", "PRODUCT": "[&1].product", "Zone*": { "$": "[&2].&1.name", "@": "[&2].&1.price", "@(1,Valid)": "[&2].&1.valid" } } } }, { "operation": "shift", "spec": { "*": { "*": "[&1].&", "Zone*": "[&1].prices[]" } } } ] Since you are new to this , I would also consider looking into another transformation language called jslt and Nifi has processor for that as well. If you are familiar with Xquery language you probably find it easier to learn and sometimes the spec is much simpler like in this scenario where it looks like the following : [ for (.) let valid=.Valid { "Code": .CODE, "Product":.PRODUCT, "prices" :[ for (.) { "name" :.key, "value": .value, "valid" : $valid } if(test(.key,"Zone\\s\\d+")) ] } ] make sure to remove the expression .!=null from the filter property if you want to have null values in the result. Hope that helps. If it does, please accept the solution.

SAMSAL · ‎08-13-2024

Hi , Are you using provided avro schema ? If so, its basically saying that your avro schema is invalid! how are you providing your schema ? Can you please post some screenshot of your configuration?

SAMSAL · ‎08-12-2024

Hi, I have been struggling for sometime now trying to figure out if Nifi cluster can be deployed using windows docker-desktop where each node lives on a different host. I managed to do it using Ubuntu system but since Ubuntu is not supported by our IT I'm trying to see if the same can be implemented in the windows env. I knew there will be challenges with windows since it has some limitations specially to containers networking over Unix. One of those limitations is that if you are trying to use host networking the IP address of the host doesnt bind to the container , instead it uses what is called a loop back. When I try to set the nifi host property to the machine fqdn or ip I keep getting the error: java.net.BindException: Cannot assign requested address Which I imagine that is a result of the IP already being reserved to the host and not accessible to the container. After doing some research to work around this problem, I came across Reverse Proxy. The idea is deploy Nifi on each host locally ( default bridge network) and all the communication with the user and other nodes are done through the Reverse Proxy (I used nginx) . It sounded simple at first but once I got in to it , it got very complex very quickly 🙂 I was able to get the RP working on a standalone instance , but once you start introducing cluster it gets very tricky with all the ports and the host naming. Actually Nifi documentation has some information regarding setting up a RP for remote (over the internet ) cluster which I got to admit it looked very intimidating specially around the dynamic routing config in the nifi.properties. None the less I tried to follow as much as I comprehend but is not for nifi nodes in docker containers . To Summarize, The problem is Nifi nodes internally needs to communicate with each other using different ports and passing each other host info specially that are set up in the cluster host address (nifi.cluster.node.address) and the load balancing host address (nifi.cluster.load.balance.host). I cant assign the host ip\fqdn because of the error above and I cant find a way to dynamically set those similar to how s2s hostname and port are set. Any help or guidance on this is highly appreciated. Thanks for reading and sorry for the long post.

SAMSAL · ‎08-12-2024

Hi, To help troubleshoot : 1- what version of nifi 2.0 are you using exactly and what system? 2- Can you post the error as appears in the log? 3- Do you see any dependency packages under the path /work/python/extension/{processor name}/2.0.0-SNAPSHOT. If the answer is no, can you provide screenshot of what us under that folder. 4- Do you see any file called "env-creation-complete.txt" under the same folder above? 5- Are you getting any error when starting nifi after enabling the python extension by uncommenting the following line : nifi.python.command=... Thanks

SAMSAL · ‎08-12-2024

If you are talking about python processor engine in 2.0 then basically it uses virtual environment library (venv ) which you should have it installed to you PYTHON_HOME path before enabling the python extension from the nifi.properties file ( see referenced guide). When you deploy your python processor under {NIFI_PATH}/python/extensions and restart nifi , the nifi framework will process this file , move it to its own directory under {NIFI_PATH}/work/python then install python venv which will create a python isolated environment for that processor which will have its own python commands including the Pip command. Those python commands will be add to the bin (or script for windows) under the processor folder which will be used to download the dependency packages to that environment as well. Regarding defining the dependency packages, if you follow any existing python processor file template you will find them defined in the dependencies tag under ProcessorDetails class as follows: class ProcessorDetails: version = "2.0.0-SNAPSHOT" description = """ Some Description""" tags = ["excel", "json", "convert"] dependencies = ['pandas','numpy','openpyxl',"xlrd"] Once the venv is created , the python engine starts downloading those packages automatically and no interference is required. You can track the download progress in the app-nifi.log file. you can know that the dependency download is complete by either checking the processor status itself in the nifi canvas where all defined properties and relationships are listed, or you can look for the file ""dependency-download.complete" " under the processor folder where the packages are getting installed. If the file has the value of True then it means the dependencies have been downloaded successfully and the processor is ready to use. Hope that helps.

SAMSAL · ‎08-10-2024

Hi , Have you looked into the Module Directory property. I have not tried it honestly but see if you can save your packages to a location visible to Nifi and if you have a cluster you need to make sure this path and the required modules are accessible for all the Nodes. Also I would recommend that you upgrade if you can to Nifi 2.0 latest release since Python is more integrated into the Nifi Framework and you can design your custom processors easily and the nifi will take care of downloading all required packages for you so that you dont have to worry about that. I would recommend you refer to the following to find more about using python extensions with Nifi 2.0 : https://nifi.apache.org/documentation/nifi-2.0.0-M2/html/python-developer-guide.html https://www.youtube.com/watch?v=9Oi_6nFmbPg&t=580s Also keep in mind the support for python\jython script has been deprecated from the ExecuteProcessor Post 2.0 . So my recommendation is either upgrade to take advantage of the python extension or try to use groovy instead since its still supported so that you can avoid the headache of having to re write all your python scripts when you are to upgrade to version 2.0 or higher. If this helps please accept solution. Thanks

SAMSAL · ‎08-10-2024

Hi @Rashad_K , I dont believe such feature exists out of the box. Processors are designed to be isolated from each other and only can communicate via relationships, therefore there is no such thing as rollback which will reset the state of the QDT based on the result of the PutSQL. If I may suggest the following: 1- Depending on why would some records fail using PutSql, you can use the PutSql Failure relationship to handle errors. For example if the failure is due to connectivity issue you can retry it by looping the failure rel back to the PutSql or you can use ControlRate processor before retrying to wait sometime . If the failure is data related or syntax you can store those records in another table for farther processing and re import them back once the data is fixed. The idea from QDT is that once data is inserted into the target table it wont be changed (like logs for example) and any correction need to be sent as new record hence the max value doesnt have to be changed on data error issues. 2- Second option will only work if you have control over the target table in the QDT where records processed successfully are removed from the table. In this case if records failed for some reason and you want to re read then you can invoke the API to reset the state , which means the QDT will start from the top and re read failed records. However in this approach you have to be aware of concurrency issues in case the state is reset while some reocrds are being processed and have not been removed yet from the QDT table where you might end up re reading those records again. That is why I dont recommend this approach. Of course there could be other options out there but my advise and based on experience is try to pick a solution where isolation is in mind to avoid conflicts and unpredictable behavior. Hope that helps. If it does, please accept solution. Thanks

SAMSAL · ‎08-07-2024

Hi , Im still looking for some help\guidance on this @MattWho , @steven-matison , @pvillard , please. I also have another question regarding docker desktop: Is there any images out there or instructions to show if nifi cluster can be deployed on windows DockerDesktop under Host mode . I have been playing with for couple days and I could not get it to work. when I try to set the Load Balancer Host or the hTTPS Host to the host machine IP I keep getting "Cannot Bind this Address" error? If someone was able to do it or know how please please do share. @VidyaSargur

SAMSAL · ‎08-07-2024

Hi, It seems the data is being converted into none datetime format . when I try a simple input with the date you provided and then do an Update Record using the toDate function I get an an epoch time which is a long integer value, Im not sure why this is happening but if you are using records reader like avro or json and you use the infer schema setting the output can be unpredictable. I would suggest to not do an update record on the date values and just try to insert them as they come since they look to be a valid datetime format string.

Online	Offline
Last Visited	‎12-31-2024 03:55 PM

Member Since	‎07-29-2020 02:31 PM
Last Visited	‎12-31-2024 03:55 PM
Posts	574
Kudos received	320

Cloudera Community

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Jolt spec to flatten the nested JSON

Re: CSVReader and CSVRecordSetWriter doesn't consi...

Re: Converting Nested JSON to Flat JSON using JOLT

Re: NIfi: javax.security.auth.login.LoginExceptio...

Re: JOLT SPEC

Re: Trying to use JOLT to turn flat objects into a...

Re: LookupRecord LookupRecord throwing Could not ...

Nifi multi host docker containers using Reverse P...

Re: Apache NiFi custom processor runtime environme...

Re: how to install python libs in the python engin...

Re: how to install python libs in the python engin...

Re: How to rollback the state of max value column ...

Re: Multi Host Nifi Cluster Deployment using Docke...

Re: how to convert string to integer or numeric us...