Member since
11-16-2015
905
Posts
666
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 514 | 09-30-2025 05:23 AM | |
| 850 | 06-26-2025 01:21 PM | |
| 761 | 06-19-2025 02:48 PM | |
| 937 | 05-30-2025 01:53 PM | |
| 11701 | 02-22-2024 12:38 PM |
08-14-2018
06:42 PM
This one is only more complex because you want to convert the field names at the second level not the first, so you want to match "address" first, then use the above spec for each field in there, and then also transfer any fields at the top level over as-is (namely "firstname", the spec (which is specific for this example) is: [
{
"operation": "shift",
"spec": {
"address": {
"*-*-*": "&(0,1)_&(0,2)_&(0,3)",
"*-*": "&(0,1)_&(0,2)",
"*": "&"
},
"*": "&"
}
}
]
... View more
08-09-2018
02:45 PM
I just tried the same line of CSV and the same regex and it works fine. Can you share the entire stack trace from the logs? There might be more information as far as where it's failing while being scheduled. Also if you copy/pasted that regex from somewhere, perhaps it has some hidden/unprintable characters, try typing it in by hand instead.
... View more
08-09-2018
02:39 AM
Can you share your ExtractText configuration and possibly some sample input? This error occurs when the processor is scheduled, and all it does when scheduled is try to compile the regular expressions, so I presume there is some error in your regex somewhere.
... View more
08-08-2018
06:13 PM
1 Kudo
The script code runs basically as the body of an onTrigger() method, which is a Processor's method that gets called when ExecuteScript is triggered to execute. You don't need that code in a class, but if you want to call main() then you have to do it outside the class but in the same script. If you're trying to be able to pass in arguments, then instead of using a class with a main() method, the user would specify them in the ExecuteScript configuration dialog as user-defined properties, and they are available to the script by their name. They get bound to the script as variables with PropertyValue objects, so to get their values you'll need to call getValue() on them (there are examples in the cookbook). If you want a full-fledged implementation of a Processor, you can use InvokeScriptedProcessor, that expects an implementation of the Processor interface, but still requires a line outside the class to store away a variable containing an instance of your Processor, then InvokeScriptedProcessor's methods get delegated to your Processor implementation. One of the advantages there is that you can add concrete properties to the InvokeScriptedProcessor dialog (via the getSupportedPropertyDescriptors() method) rather than passing in user-defined properties as variables to the script. I have some examples on my blog on how to use InvokeScriptedProcessor.
... View more
08-07-2018
06:48 PM
1 Kudo
Oops I put the def flowFile inside the try, I have since edited the answer to (hopefully!) be correct
... View more
08-07-2018
06:01 PM
2 Kudos
A FlowFile doesn't ever really exist as a Java File object, instead you access its contents as an InputStream. I believe PDDocument has a load(InputStream) method, so you could do something like: import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
def flowFile = session.get()
if(!flowFile) return
try {
def inputStream = session.read(flowFile) PDDocument document = PDDocument.load(inputStream)
PDFTextStripperByArea stripper = newPDFTextStripperByArea()
// Do your other stuff here, probably writing something out to flow file(s)?
inputStream.close()
// If you changed the original flow file, transfer it here
session.transfer(flowFile, REL_SUCCESS)
} catch(Exception whatever) {
print(whatever)
// Something went wrong, send the original flow file to failure
session.transfer(flowFile, REL_FAILURE)
}
println('it worked') If you're going to be replacing the contents of the incoming flow file with some extraction from the PDF, then you can do both the read and the write in a "StreamCallback", check out Part 2 of my ExecuteScript Cookbook for ways to read/write flow files.
... View more
08-07-2018
05:42 PM
1 Kudo
Translate Field Names "normalizes" the column names by uppercasing them, but also by removing the underscores, which should explain why TEST_ID isn't matching, but I can't tell why STRING isn't matching. Can you try setting the field names in the schema to their uppercase counterparts, as well as the keys in the JSON file? For JSON inputs, you can also use JoltTransformJSON (for a flat JSON file of simple key/value pairs) check out this spec which lowercases the field names, you can change the modify function to =toUpper instead of =toLower.
... View more
08-03-2018
03:12 PM
Currently, NiFi support for Redis is in the form of the RedisDistributedMapCacheClientService, so you can use FetchDistributedMapCache and PutDistributedMapCache to get data in and out of Redis. Check the documentation for the Redis service as it may not use the same directives as you were expecting, so I'm not sure if this will be sufficient for your use case or not. An alternative is to use a scripting processor such as ExecuteScript or InvokeScriptedProcessor, you can point the Module Directory property to the Redis client JARs and basically write your own "mini-processor" in a language like Groovy or Jython to interact with Redis how you wish.
... View more
08-01-2018
07:48 PM
If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.
... View more
08-01-2018
03:58 PM
1 Kudo
Bryan's InferAvroSchema answer should work well in this case, but as an alternative, you might consider "normalizing" your schema by using JoltTransformJSON to change each flow file into the same schema. For example, using the following Chain spec: [
{
"operation": "shift",
"spec": {
"id_*": {
"@": "entry.[#2].value",
"$(0,1)": "entry.[#2].id"
}
}
}
] And the following input: { "id_4344" : 1532102971, "id_4544" : 1532102972 } You get the following output: {
"entry" : [ {
"value" : 1532102971,
"id" : "4344"
}, {
"value" : 1532102972,
"id" : "4544"
} ]
} This allows you to predefine the schema, removing the need for the schema and readers to be dynamic. If you don't want the (possibly unnecessary) "entry" array inside the single JSON object, you can produce a top-level array with the following spec: [
{
"operation": "shift",
"spec": {
"id_*": {
"@": "[#2].value",
"$(0,1)": "[#2].id"
}
}
}
] Which gives you the following output: [ {
"value" : 1532102971,
"id" : "4344"
}, {
"value" : 1532102972,
"id" : "4544"
} ]
... View more