Member since
11-16-2015
911
Posts
668
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 694 | 09-30-2025 05:23 AM | |
| 1065 | 06-26-2025 01:21 PM | |
| 927 | 06-19-2025 02:48 PM | |
| 1090 | 05-30-2025 01:53 PM | |
| 12252 | 02-22-2024 12:38 PM |
08-07-2018
06:48 PM
1 Kudo
Oops I put the def flowFile inside the try, I have since edited the answer to (hopefully!) be correct
... View more
08-07-2018
06:01 PM
2 Kudos
A FlowFile doesn't ever really exist as a Java File object, instead you access its contents as an InputStream. I believe PDDocument has a load(InputStream) method, so you could do something like: import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
def flowFile = session.get()
if(!flowFile) return
try {
def inputStream = session.read(flowFile) PDDocument document = PDDocument.load(inputStream)
PDFTextStripperByArea stripper = newPDFTextStripperByArea()
// Do your other stuff here, probably writing something out to flow file(s)?
inputStream.close()
// If you changed the original flow file, transfer it here
session.transfer(flowFile, REL_SUCCESS)
} catch(Exception whatever) {
print(whatever)
// Something went wrong, send the original flow file to failure
session.transfer(flowFile, REL_FAILURE)
}
println('it worked') If you're going to be replacing the contents of the incoming flow file with some extraction from the PDF, then you can do both the read and the write in a "StreamCallback", check out Part 2 of my ExecuteScript Cookbook for ways to read/write flow files.
... View more
08-07-2018
05:42 PM
1 Kudo
Translate Field Names "normalizes" the column names by uppercasing them, but also by removing the underscores, which should explain why TEST_ID isn't matching, but I can't tell why STRING isn't matching. Can you try setting the field names in the schema to their uppercase counterparts, as well as the keys in the JSON file? For JSON inputs, you can also use JoltTransformJSON (for a flat JSON file of simple key/value pairs) check out this spec which lowercases the field names, you can change the modify function to =toUpper instead of =toLower.
... View more
08-03-2018
03:12 PM
Currently, NiFi support for Redis is in the form of the RedisDistributedMapCacheClientService, so you can use FetchDistributedMapCache and PutDistributedMapCache to get data in and out of Redis. Check the documentation for the Redis service as it may not use the same directives as you were expecting, so I'm not sure if this will be sufficient for your use case or not. An alternative is to use a scripting processor such as ExecuteScript or InvokeScriptedProcessor, you can point the Module Directory property to the Redis client JARs and basically write your own "mini-processor" in a language like Groovy or Jython to interact with Redis how you wish.
... View more
08-01-2018
07:48 PM
If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.
... View more
08-01-2018
03:58 PM
1 Kudo
Bryan's InferAvroSchema answer should work well in this case, but as an alternative, you might consider "normalizing" your schema by using JoltTransformJSON to change each flow file into the same schema. For example, using the following Chain spec: [
{
"operation": "shift",
"spec": {
"id_*": {
"@": "entry.[#2].value",
"$(0,1)": "entry.[#2].id"
}
}
}
] And the following input: { "id_4344" : 1532102971, "id_4544" : 1532102972 } You get the following output: {
"entry" : [ {
"value" : 1532102971,
"id" : "4344"
}, {
"value" : 1532102972,
"id" : "4544"
} ]
} This allows you to predefine the schema, removing the need for the schema and readers to be dynamic. If you don't want the (possibly unnecessary) "entry" array inside the single JSON object, you can produce a top-level array with the following spec: [
{
"operation": "shift",
"spec": {
"id_*": {
"@": "[#2].value",
"$(0,1)": "[#2].id"
}
}
}
] Which gives you the following output: [ {
"value" : 1532102971,
"id" : "4344"
}, {
"value" : 1532102972,
"id" : "4544"
} ]
... View more
07-30-2018
09:00 PM
Oracle has different syntax for aliasing columns (i.e. use "AS") versus tables (i.e. don't use "AS"). The existing code in 1.7.0 hardcodes the "AS" keyword. I have written NIFI-5471 to delegate the generation of the table alias clause to the database adapter. Unfortunately I am not aware of any workaround.
... View more
07-30-2018
08:40 PM
In NiFi 1.7.0 I believe you can right-click on the processor and choose "Terminate threads". If for some reason that doesn't work I think you have to restart the NiFi instance.
... View more
07-30-2018
06:26 PM
ValidateRecord is more about validating the individual records than it is about validating the entire flow file. If some records are valid and some are invalid, each type will be routed to the corresponding relationship. However, for invalid records, we can't use the same record writer as valid records, or else we know it will fail (because we know they're invalid), so there is a second RecordWriter for invalid records (you might use this to try to record the field names or something, but by the time that ValidateRecord knows the individual record is invalid, it doesn't know that it came in as Avro (for example), nor does it know that you might want it to go out as Avro. That's the flexibility and power of the Record Reader/Writer paradigm, but in this case the tradeoff is that you can't currently treat the entire flow file as valid or invalid. It may make sense to have a "Invalid Record Strategy" property, to choose between "Individual Records" using the RecordWriters (the current behavior), or "Original FlowFile" which would ignore the RecordWriters and instead transfer the entire incoming flow file as-is to the 'invalid' relationship. Please feel free to file an improvement Jira for this capability.
... View more
07-30-2018
05:49 PM
1 Kudo
When you see the number in the upper-right hand corner, that refers to the fact that even though the processor is "stopped", there are still threads running. You won't be able to edit the configuration or restart it until those threads have stopped (the number and icon will disappear).
... View more