Member since
06-14-2023
95
Posts
33
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3839 | 12-29-2023 09:36 AM | |
5613 | 12-28-2023 01:01 PM | |
1102 | 12-27-2023 12:14 PM | |
557 | 12-08-2023 12:47 PM | |
1744 | 11-21-2023 10:56 PM |
02-01-2024
07:51 PM
1 Kudo
Have you seen this post? https://community.cloudera.com/t5/Support-Questions/Nifi-2-0-0-M1-Installation-error-with-python/m-p/381430
... View more
01-22-2024
12:30 PM
Do you have a sample? I'm not sure NiFi can do this natively, but I have recently done some PDF parsing inside NiFi with a custom Groovy processor.
... View more
01-05-2024
01:43 PM
If clustered, is Zookeeper running on each node or has that been separated? Wondering if selecting a new master or having an acceptable quorum is contributing to the slowness.
... View more
01-02-2024
11:31 AM
I don't see any parquet NAR files in my NiFi 2.0.0-M1 install or in the Docker image.
... View more
01-02-2024
11:26 AM
If you were able to delete it, then it almost sounds like that attribute might have been something from the 1.X versions and not something new in the 2.X version.
... View more
12-29-2023
09:36 AM
2 Kudos
@Heeya8876 , both @SAMSAL and I have recently gone through the adventures of getting 2.0.0-M1 to run with the Python extension enabled. Here are some findings so far on the Linux side of things. Java 21 is required (any platform) Python 3.9+ (any platform) is required (I believe @SAMSAL, correct me if I'm wrong, said Python 3.12 did NOT work, but we both got 3.11 to run) If it's installed, make sure it's the default set with "sudo update-alternatives java" Make sure your environment has JAVA_HOME defined with the path for Java 21 Make sure Python3.9+ is the default prior to running NiFi with "sudo update-alternatives --config python3" Executing python3 --version should show whichever version you set as your default and it should be 3.9~3.11 You can see what version was copied by NiFi in the directory "./work/python/controller/bin/python3 --version" If this is showing anything <3.9 then delete the work folder, follow the steps above, and try again. If you build a processor from scratch the Developer guide says to use this for your __init__ def __init__(self, **kwargs):
super().__init__(**kwargs) You'll get an error...replace super().__init__(**kwargs) with pass like the examples that come with the install. Changes to your Python extensions are not immediate....NiFi polls the directory periodically to detect changes, download dependencies, and load the updated processors. Sometimes I had to restart NiFi to get it to detect my changes if my previous code update made it really unhappy. ./logs/nifi-python.log will be your friend for Python extension related issues If your Python extension has dependencies and it fails to download them you can see the command it attempted in nifi-python.log; I manually ran the commands in the logs and it downloaded the modules into the correct place and worked...perhaps there's a timeout for module downloads? (just a guess since the module had a ton of large dependencies) I don't think I saw it in the Developer's Guide but did notice while building a custom FlowFileTransform Python extension, the "content" data returned with the FlowFileTransformResult should be a string or byte array. @SAMSAL has additional insight on getting it to start up on Windows
... View more
12-29-2023
08:50 AM
If you want to avoid duplicates you could hash the content of the files and leverage the DetectDuplicate processor to only insert the unique files into your DB.
... View more
12-28-2023
01:17 PM
Agree with @SAMSAL's approach and if you can provide a parameter or something in the header or request so your API returns a JSON response each time it'll make things a lot easier for you to parse and build the request for the next step in your flow.
... View more
12-28-2023
01:01 PM
1 Kudo
ExecuteGroovyScript alternative with this input {
"idTransakcji": "123",
"date": "",
"name": "sam"
} import groovy.json.JsonOutput
import groovy.json.JsonSlurper
import java.nio.charset.StandardCharsets
JsonSlurper jsonSlurper = new JsonSlurper()
JsonOutput jsonOutput = new JsonOutput()
FlowFile flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, { inputStream, outputStream ->
Map data = jsonSlurper.parse(inputStream)
data = [
"id": data.idTransakcji,
"user": [
"date": data.date?.isNumber() ? Long.parseLong(data.date) : null,
"name": data.name
]
]
outputStream.write(jsonOutput.toJson(data).getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
... View more
12-28-2023
12:25 PM
2 Kudos
...a 3rd option because I like scripted processors 😂...using ExcecuteGroovyScript import groovy.json.JsonOutput
import groovy.json.JsonSlurper
import java.nio.charset.StandardCharsets
JsonSlurper jsonSlurper = new JsonSlurper()
JsonOutput jsonOutput = new JsonOutput()
FlowFile flowFile = session.get()
if(!flowFile) return
flowFile = session.write(flowFile, { inputStream, outputStream ->
List<Map> data = jsonSlurper.parse(inputStream)
data.each {
it.order_item = jsonSlurper.parseText(it.order_item)
}
outputStream.write(jsonOutput.toJson(data).getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS) Looks like a lot but this is what takes the string JSON and converts it to JSON: it.order_item = jsonSlurper.parseText(it.order_item)
... View more