About ariffle

ariffle · ‎10-28-2020

This issue appears to be related to https://issues.apache.org/jira/browse/NIFI-4417 I also tried using UpdateAttribute to create my regex in an attribute, then use the attribute as the Search Value in ReplaceText, but that appears to have the same issue of NiFi attributes not getting evaluated properly in the Search Value.

ariffle · ‎10-28-2020

I'm trying to use ReplaceText to remove x number of lines from the top of a flowfile based on a flowfile attribute. I'm using the following regex but ReplaceText says its invalid: ^(.*?\n){${skip_lines}} It seems like I should be able to reference a flowfile attribute from regex according to this question but I just get an error. Any idea how I should be doing this? My full config is below:

ariffle · ‎07-11-2020

I now see that 'Infer Schema' is an option in Record readers, so this processor is no longer needed. Leaving this up so others might find it.

ariffle · ‎07-11-2020

I feel a little silly asking because I can't find anything about this on the internet, but was InferAvroSchema removed from NiFi 1.11? My organization recently upgraded our NiFi version and I noticed it was missing, but figured it was something they had been messing with. However I upgraded my home server's NiFi and I notice its missing from there too. I'm hoping that it was replaced by another processor or something? I really use this a lot.

ariffle · ‎11-29-2018

I figured there had to be a better way to do that, thanks @Matt Burgess! Is there documentation on the different programming language API's that I'm missing? I've been working off of your excellent ExecuteScript cookbooks posted here, but beyond that I couldn't find in the documentation where I could have looked up something like session.remove().

ariffle · ‎11-28-2018

I'm trying to read a JSON from `flowFile` and add the contents as attribute keys in the empty `updated_flowFile`, but I get `transfer relationship not specified` even though I'm specifying it. from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import InputStreamCallback from org.apache.nifi.processor.io import OutputStreamCallback import json data = {} # Read contents of flowFile and write contents to data{} class PyInputStreamCallback(InputStreamCallback): def __init__(self): pass def process(self, inputStream): text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) global data data = json.loads(text) # Get incoming flowFile and call PyInputStreamCallback flowFile = session.get() if (flowFile != None): try: session.read(flowFile, PyInputStreamCallback()) global data # Create a blank flowfile, update the attributes with contents of data{} and and write it to session updated_flowFile = session.create() updated_flowFile = session.putAttribute(updated_flowFile, 'left', data['left']) updated_flowFile = session.putAttribute(updated_flowFile, 'top', data['top']) session.close(flowFile) session.transfer(updated_flowFile, REL_SUCCESS) except: session.close(updated_flowFile) session.transfer(flowFile, REL_FAILURE) else: session.transfer(flowFile, REL_FAILURE) Alternatively, if there's a way to use the same flowFile object and wipe the JSON contents that would work too. I am doing a mergeContent later in my pipeline so I need the contents to be totally empty except for the attributes I'm adding.

ariffle · ‎08-09-2018

This issue was caused by me not using try/catch properly. Since the files weren't visible to the rest of my code outside the try/catch, it was returning the PDF.

ariffle · ‎08-08-2018

I found Matt's cookbooks and I'm following the recipe for overwriting a FlowFile. It seems very simple and straightforward and I'm not sure what I'm missing. My code is supposed to read the PDF in from the FlowFile, use PDFBox to extract first and last name from the form (it's an I9) and then output the results into a JSON which gets sent out in REL_SUCCESS. Instead it just outputs the PDF file to REL_SUCCESS. Not sure if it's never being read which is causing blank output or I'm writing it out wrong or what. import java.nio.charset.StandardCharsets import org.apache.pdfbox.io.IOUtils import org.apache.pdfbox.pdmodel.PDDocument import org.apache.pdfbox.util.PDFTextStripperByArea import java.awt.Rectangle import org.apache.pdfbox.pdmodel.PDPage import com.google.gson.Gson import java.nio.charset.StandardCharsets def flowFile = session.get() flowFile = session.write(flowFile, { inputStream, outputStream -> try { //Load Flowfile contents PDDocument document = PDDocument.load(inputStream) PDFTextStripperByArea stripper = new PDFTextStripperByArea() //Get the first page List<PDPage> allPages = document.getDocumentCatalog().getAllPages() PDPage page = allPages.get(0) } catch (Exception e){ System.out.println(e.getMessage()) session.transfer(flowFile, REL_FAILURE) } //Define the areas to search and add them as search regions stripper = new PDFTextStripperByArea() Rectangle lname = new Rectangle(25, 226, 240, 15) stripper.addRegion("lname", lname) Rectangle fname = new Rectangle(276, 226, 240, 15) stripper.addRegion("fname", fname) //Load the results into a JSON def boxMap = [:] stripper.setSortByPosition(true) stripper.extractRegions(page) regions = stripper.getRegions() for (String region : regions) { String box = stripper.getTextForRegion(region) boxMap.put(region, box) } Gson gson = new Gson() //Remove random noise from the output json = gson.toJson(boxMap, LinkedHashMap.class) json = json.replace('\\n', '') json = json.replace('\\r', '') json = json.replace(',"', ',\n"') //Overwrite flowfile contents with JSON outputStream.write(json.getBytes(StandardCharsets.UTF_8)) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) Help appreciated!

ariffle · ‎08-08-2018

Thanks for everything @Matt Burgess I was able to get this going by learning making my code more Groovy and cutting the need for classes and the main() method out of my implementation.

ariffle · ‎08-08-2018

Hey @Matt Burgess that worked, thanks! I'm trying to scale up now and when I try adding that code to a class and calling it from main() I get errors about static keyword and context. I've tried running it from a run() method and then calling that from main, moving the flowFile declaration outside of main but I'm just not understanding. Sorry to be such a bother I just can't find this in the documentation or examples of doing this from a class. import org.apache.pdfbox.io.IOUtils import org.apache.pdfbox.pdmodel.PDDocument import org.apache.pdfbox.util.PDFTextStripperByArea import java.awt.Rectangle import org.apache.pdfbox.pdmodel.PDPage import com.google.gson.Gson class nocr { static void main(String args) { def flowFile = session.get() if (!flowFile) return try { def inputStream = session.read(flowFile) PDDocument document = PDDocument.load(inputStream) PDFTextStripperByArea stripper = newPDFTextStripperByArea() // Do your other stuff here, probably writing something out to flow file(s)? inputStream.close() // If you changed the original flow file, transfer it here session.transfer(flowFile, REL_SUCCESS) } catch ( Exception whatever ) { print(whatever) // Something went wrong, send the original flow file to failure session.transfer(flowFile, REL_FAILURE) } println('it worked') } }

Online	Offline
Last Visited	‎09-09-2021 05:46 PM

Member Since	‎05-02-2018 09:20 PM
Last Visited	‎09-09-2021 05:46 PM
Posts	27
Kudos received	2

Cloudera Community

Re: FlowFile not being read and/or written when t...

Re: Error when referencing flowfile attribute in r...

Error when referencing flowfile attribute in regex...

Re: Did InferAvroSchema get removed?

Did InferAvroSchema get removed?

Re: Creating and transferring new flowFile ends wi...

Creating and transferring new flowFile ends with '...

Re: FlowFile not being read and/or written when t...

FlowFile not being read and/or written when tryin...

Re: groovy read flowfile: errorless hang when usin...

Re: groovy read flowfile: errorless hang when usin...