About joseomjr

joseomjr · ‎02-01-2024

Have you seen this post? https://community.cloudera.com/t5/Support-Questions/Nifi-2-0-0-M1-Installation-error-with-python/m-p/381430

joseomjr · ‎01-22-2024

Do you have a sample? I'm not sure NiFi can do this natively, but I have recently done some PDF parsing inside NiFi with a custom Groovy processor.

joseomjr · ‎01-05-2024

If clustered, is Zookeeper running on each node or has that been separated? Wondering if selecting a new master or having an acceptable quorum is contributing to the slowness.

joseomjr · ‎01-02-2024

I don't see any parquet NAR files in my NiFi 2.0.0-M1 install or in the Docker image.

joseomjr · ‎01-02-2024

If you were able to delete it, then it almost sounds like that attribute might have been something from the 1.X versions and not something new in the 2.X version.

joseomjr · ‎12-29-2023

@Heeya8876 , both @SAMSAL and I have recently gone through the adventures of getting 2.0.0-M1 to run with the Python extension enabled. Here are some findings so far on the Linux side of things. Java 21 is required (any platform) Python 3.9+ (any platform) is required (I believe @SAMSAL, correct me if I'm wrong, said Python 3.12 did NOT work, but we both got 3.11 to run) If it's installed, make sure it's the default set with "sudo update-alternatives java" Make sure your environment has JAVA_HOME defined with the path for Java 21 Make sure Python3.9+ is the default prior to running NiFi with "sudo update-alternatives --config python3" Executing python3 --version should show whichever version you set as your default and it should be 3.9~3.11 You can see what version was copied by NiFi in the directory "./work/python/controller/bin/python3 --version" If this is showing anything <3.9 then delete the work folder, follow the steps above, and try again. If you build a processor from scratch the Developer guide says to use this for your __init__ def __init__(self, **kwargs): super().__init__(**kwargs) You'll get an error...replace super().__init__(**kwargs) with pass like the examples that come with the install. Changes to your Python extensions are not immediate....NiFi polls the directory periodically to detect changes, download dependencies, and load the updated processors. Sometimes I had to restart NiFi to get it to detect my changes if my previous code update made it really unhappy. ./logs/nifi-python.log will be your friend for Python extension related issues If your Python extension has dependencies and it fails to download them you can see the command it attempted in nifi-python.log; I manually ran the commands in the logs and it downloaded the modules into the correct place and worked...perhaps there's a timeout for module downloads? (just a guess since the module had a ton of large dependencies) I don't think I saw it in the Developer's Guide but did notice while building a custom FlowFileTransform Python extension, the "content" data returned with the FlowFileTransformResult should be a string or byte array. @SAMSAL has additional insight on getting it to start up on Windows

joseomjr · ‎12-29-2023

If you want to avoid duplicates you could hash the content of the files and leverage the DetectDuplicate processor to only insert the unique files into your DB.

joseomjr · ‎12-28-2023

Agree with @SAMSAL's approach and if you can provide a parameter or something in the header or request so your API returns a JSON response each time it'll make things a lot easier for you to parse and build the request for the next step in your flow.

joseomjr · ‎12-28-2023

ExecuteGroovyScript alternative with this input { "idTransakcji": "123", "date": "", "name": "sam" } import groovy.json.JsonOutput import groovy.json.JsonSlurper import java.nio.charset.StandardCharsets JsonSlurper jsonSlurper = new JsonSlurper() JsonOutput jsonOutput = new JsonOutput() FlowFile flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> Map data = jsonSlurper.parse(inputStream) data = [ "id": data.idTransakcji, "user": [ "date": data.date?.isNumber() ? Long.parseLong(data.date) : null, "name": data.name ] ] outputStream.write(jsonOutput.toJson(data).getBytes(StandardCharsets.UTF_8)) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS)

joseomjr · ‎12-28-2023

...a 3rd option because I like scripted processors 😂...using ExcecuteGroovyScript import groovy.json.JsonOutput import groovy.json.JsonSlurper import java.nio.charset.StandardCharsets JsonSlurper jsonSlurper = new JsonSlurper() JsonOutput jsonOutput = new JsonOutput() FlowFile flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> List<Map> data = jsonSlurper.parse(inputStream) data.each { it.order_item = jsonSlurper.parseText(it.order_item) } outputStream.write(jsonOutput.toJson(data).getBytes(StandardCharsets.UTF_8)) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Looks like a lot but this is what takes the string JSON and converts it to JSON: it.order_item = jsonSlurper.parseText(it.order_item)

Online	Offline
Last Visited	‎12-17-2024 09:55 PM

Member Since	‎06-14-2023 12:02 PM
Last Visited	‎12-17-2024 09:55 PM
Posts	95
Kudos received	33

Cloudera Community

Re: Nifi 2.0.0 M1 Installation error with python

Re: how to replace empty string with null in neste...

Re: ListenUDP Fault tolerance

Re: terminating kafka connection if publish kafka ...

Re: unable to resolve class groovy.yaml.YamlSlurpe...

Re: Installing NIFI 2.0.0 M2 on Ubuntu Linux java....

Re: How to Extract Files From a PDF Portfolio in N...

Re: Nifi - 2 nodes of the cluster take very long t...

Re: possible bug missing parquetreader version 2.0...

Re: nifi 2.0 bug with InvokeHTTP procdessor

Re: Nifi 2.0.0 M1 Installation error with python

Re: Nifi UNpack files issue

Re: Request for Support with Passing Request Body ...

Re: how to replace empty string with null in neste...

Re: Preparing nested JSON using SQL in NiFi