About mburgess

mburgess · ‎08-14-2018

This one is only more complex because you want to convert the field names at the second level not the first, so you want to match "address" first, then use the above spec for each field in there, and then also transfer any fields at the top level over as-is (namely "firstname", the spec (which is specific for this example) is: [ { "operation": "shift", "spec": { "address": { "*-*-*": "&(0,1)_&(0,2)_&(0,3)", "*-*": "&(0,1)_&(0,2)", "*": "&" }, "*": "&" } } ]

mburgess · ‎08-09-2018

I just tried the same line of CSV and the same regex and it works fine. Can you share the entire stack trace from the logs? There might be more information as far as where it's failing while being scheduled. Also if you copy/pasted that regex from somewhere, perhaps it has some hidden/unprintable characters, try typing it in by hand instead.

mburgess · ‎08-09-2018

Can you share your ExtractText configuration and possibly some sample input? This error occurs when the processor is scheduled, and all it does when scheduled is try to compile the regular expressions, so I presume there is some error in your regex somewhere.

mburgess · ‎08-08-2018

The script code runs basically as the body of an onTrigger() method, which is a Processor's method that gets called when ExecuteScript is triggered to execute. You don't need that code in a class, but if you want to call main() then you have to do it outside the class but in the same script. If you're trying to be able to pass in arguments, then instead of using a class with a main() method, the user would specify them in the ExecuteScript configuration dialog as user-defined properties, and they are available to the script by their name. They get bound to the script as variables with PropertyValue objects, so to get their values you'll need to call getValue() on them (there are examples in the cookbook). If you want a full-fledged implementation of a Processor, you can use InvokeScriptedProcessor, that expects an implementation of the Processor interface, but still requires a line outside the class to store away a variable containing an instance of your Processor, then InvokeScriptedProcessor's methods get delegated to your Processor implementation. One of the advantages there is that you can add concrete properties to the InvokeScriptedProcessor dialog (via the getSupportedPropertyDescriptors() method) rather than passing in user-defined properties as variables to the script. I have some examples on my blog on how to use InvokeScriptedProcessor.

mburgess · ‎08-07-2018

Oops I put the def flowFile inside the try, I have since edited the answer to (hopefully!) be correct

mburgess · ‎08-07-2018

A FlowFile doesn't ever really exist as a Java File object, instead you access its contents as an InputStream. I believe PDDocument has a load(InputStream) method, so you could do something like: import org.apache.pdfbox.io.IOUtils import org.apache.pdfbox.pdmodel.PDDocument import org.apache.pdfbox.util.PDFTextStripperByArea import java.awt.Rectangle import org.apache.pdfbox.pdmodel.PDPage import com.google.gson.Gson def flowFile = session.get() if(!flowFile) return try { def inputStream = session.read(flowFile) PDDocument document = PDDocument.load(inputStream) PDFTextStripperByArea stripper = newPDFTextStripperByArea() // Do your other stuff here, probably writing something out to flow file(s)? inputStream.close() // If you changed the original flow file, transfer it here session.transfer(flowFile, REL_SUCCESS) } catch(Exception whatever) { print(whatever) // Something went wrong, send the original flow file to failure session.transfer(flowFile, REL_FAILURE) } println('it worked') If you're going to be replacing the contents of the incoming flow file with some extraction from the PDF, then you can do both the read and the write in a "StreamCallback", check out Part 2 of my ExecuteScript Cookbook for ways to read/write flow files.

mburgess · ‎08-07-2018

Translate Field Names "normalizes" the column names by uppercasing them, but also by removing the underscores, which should explain why TEST_ID isn't matching, but I can't tell why STRING isn't matching. Can you try setting the field names in the schema to their uppercase counterparts, as well as the keys in the JSON file? For JSON inputs, you can also use JoltTransformJSON (for a flat JSON file of simple key/value pairs) check out this spec which lowercases the field names, you can change the modify function to =toUpper instead of =toLower.

mburgess · ‎08-03-2018

Currently, NiFi support for Redis is in the form of the RedisDistributedMapCacheClientService, so you can use FetchDistributedMapCache and PutDistributedMapCache to get data in and out of Redis. Check the documentation for the Redis service as it may not use the same directives as you were expecting, so I'm not sure if this will be sufficient for your use case or not. An alternative is to use a scripting processor such as ExecuteScript or InvokeScriptedProcessor, you can point the Module Directory property to the Redis client JARs and basically write your own "mini-processor" in a language like Groovy or Jython to interact with Redis how you wish.

mburgess · ‎08-01-2018

If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.

mburgess · ‎08-01-2018

Bryan's InferAvroSchema answer should work well in this case, but as an alternative, you might consider "normalizing" your schema by using JoltTransformJSON to change each flow file into the same schema. For example, using the following Chain spec: [ { "operation": "shift", "spec": { "id_*": { "@": "entry.[#2].value", "$(0,1)": "entry.[#2].id" } } } ] And the following input: { "id_4344" : 1532102971, "id_4544" : 1532102972 } You get the following output: { "entry" : [ { "value" : 1532102971, "id" : "4344" }, { "value" : 1532102972, "id" : "4544" } ] } This allows you to predefine the schema, removing the need for the schema and readers to be dynamic. If you don't want the (possibly unnecessary) "entry" array inside the single JSON object, you can produce a top-level array with the following spec: [ { "operation": "shift", "spec": { "id_*": { "@": "[#2].value", "$(0,1)": "[#2].id" } } } ] Which gives you the following output: [ { "value" : 1532102971, "id" : "4344" }, { "value" : 1532102972, "id" : "4544" } ]

Online	Offline
Last Visited	‎10-29-2025 10:31 AM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 10:31 AM
Posts	905
Kudos received	659

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: How to do JOLT replace on all JSON keys in Nif...

Re: Using Apache Nifi. Failed to initialize Error ...

Re: Using Apache Nifi. Failed to initialize Error ...

Re: groovy read flowfile: errorless hang when usin...

Re: groovy read flowfile: errorless hang when usin...

Re: groovy read flowfile: errorless hang when usin...

Re: Insert JSON to Database (PutDatabaseRecord)

Re: Is there any nifi processor available to fetch...

Re: Nifi ExecuteScript: Using external libraries w...

Re: NiFi Dynamic reader