About mburgess

mburgess · ‎08-07-2018

Oops I put the def flowFile inside the try, I have since edited the answer to (hopefully!) be correct

mburgess · ‎08-07-2018

A FlowFile doesn't ever really exist as a Java File object, instead you access its contents as an InputStream. I believe PDDocument has a load(InputStream) method, so you could do something like: import org.apache.pdfbox.io.IOUtils import org.apache.pdfbox.pdmodel.PDDocument import org.apache.pdfbox.util.PDFTextStripperByArea import java.awt.Rectangle import org.apache.pdfbox.pdmodel.PDPage import com.google.gson.Gson def flowFile = session.get() if(!flowFile) return try { def inputStream = session.read(flowFile) PDDocument document = PDDocument.load(inputStream) PDFTextStripperByArea stripper = newPDFTextStripperByArea() // Do your other stuff here, probably writing something out to flow file(s)? inputStream.close() // If you changed the original flow file, transfer it here session.transfer(flowFile, REL_SUCCESS) } catch(Exception whatever) { print(whatever) // Something went wrong, send the original flow file to failure session.transfer(flowFile, REL_FAILURE) } println('it worked') If you're going to be replacing the contents of the incoming flow file with some extraction from the PDF, then you can do both the read and the write in a "StreamCallback", check out Part 2 of my ExecuteScript Cookbook for ways to read/write flow files.

mburgess · ‎08-07-2018

Translate Field Names "normalizes" the column names by uppercasing them, but also by removing the underscores, which should explain why TEST_ID isn't matching, but I can't tell why STRING isn't matching. Can you try setting the field names in the schema to their uppercase counterparts, as well as the keys in the JSON file? For JSON inputs, you can also use JoltTransformJSON (for a flat JSON file of simple key/value pairs) check out this spec which lowercases the field names, you can change the modify function to =toUpper instead of =toLower.

mburgess · ‎08-03-2018

Currently, NiFi support for Redis is in the form of the RedisDistributedMapCacheClientService, so you can use FetchDistributedMapCache and PutDistributedMapCache to get data in and out of Redis. Check the documentation for the Redis service as it may not use the same directives as you were expecting, so I'm not sure if this will be sufficient for your use case or not. An alternative is to use a scripting processor such as ExecuteScript or InvokeScriptedProcessor, you can point the Module Directory property to the Redis client JARs and basically write your own "mini-processor" in a language like Groovy or Jython to interact with Redis how you wish.

mburgess · ‎08-01-2018

If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.

mburgess · ‎08-01-2018

Bryan's InferAvroSchema answer should work well in this case, but as an alternative, you might consider "normalizing" your schema by using JoltTransformJSON to change each flow file into the same schema. For example, using the following Chain spec: [ { "operation": "shift", "spec": { "id_*": { "@": "entry.[#2].value", "$(0,1)": "entry.[#2].id" } } } ] And the following input: { "id_4344" : 1532102971, "id_4544" : 1532102972 } You get the following output: { "entry" : [ { "value" : 1532102971, "id" : "4344" }, { "value" : 1532102972, "id" : "4544" } ] } This allows you to predefine the schema, removing the need for the schema and readers to be dynamic. If you don't want the (possibly unnecessary) "entry" array inside the single JSON object, you can produce a top-level array with the following spec: [ { "operation": "shift", "spec": { "id_*": { "@": "[#2].value", "$(0,1)": "[#2].id" } } } ] Which gives you the following output: [ { "value" : 1532102971, "id" : "4344" }, { "value" : 1532102972, "id" : "4544" } ]

mburgess · ‎07-30-2018

Oracle has different syntax for aliasing columns (i.e. use "AS") versus tables (i.e. don't use "AS"). The existing code in 1.7.0 hardcodes the "AS" keyword. I have written NIFI-5471 to delegate the generation of the table alias clause to the database adapter. Unfortunately I am not aware of any workaround.

mburgess · ‎07-30-2018

In NiFi 1.7.0 I believe you can right-click on the processor and choose "Terminate threads". If for some reason that doesn't work I think you have to restart the NiFi instance.

mburgess · ‎07-30-2018

ValidateRecord is more about validating the individual records than it is about validating the entire flow file. If some records are valid and some are invalid, each type will be routed to the corresponding relationship. However, for invalid records, we can't use the same record writer as valid records, or else we know it will fail (because we know they're invalid), so there is a second RecordWriter for invalid records (you might use this to try to record the field names or something, but by the time that ValidateRecord knows the individual record is invalid, it doesn't know that it came in as Avro (for example), nor does it know that you might want it to go out as Avro. That's the flexibility and power of the Record Reader/Writer paradigm, but in this case the tradeoff is that you can't currently treat the entire flow file as valid or invalid. It may make sense to have a "Invalid Record Strategy" property, to choose between "Individual Records" using the RecordWriters (the current behavior), or "Original FlowFile" which would ignore the RecordWriters and instead transfer the entire incoming flow file as-is to the 'invalid' relationship. Please feel free to file an improvement Jira for this capability.

mburgess · ‎07-30-2018

When you see the number in the upper-right hand corner, that refers to the fact that even though the processor is "stopped", there are still threads running. You won't be able to edit the configuration or restart it until those threads have stopped (the number and icon will disappear).

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: groovy read flowfile: errorless hang when usin...

Re: groovy read flowfile: errorless hang when usin...

Re: Insert JSON to Database (PutDatabaseRecord)

Re: Is there any nifi processor available to fetch...

Re: Nifi ExecuteScript: Using external libraries w...

Re: NiFi Dynamic reader

Re: Nifi QueryDatabaseTable Oracle Custom Query Al...

Re: In a Nifi-workflow some of the processors are ...

Re: How to validating a JSON file with JSON Schema...

Re: In a Nifi-workflow some of the processors are ...