Support Questions

Find answers, ask questions, and share your expertise

FlowFile not being read and/or written when trying to overwrite with Groovy

avatar
Contributor

I found Matt's cookbooks and I'm following the recipe for overwriting a FlowFile. It seems very simple and straightforward and I'm not sure what I'm missing.

My code is supposed to read the PDF in from the FlowFile, use PDFBox to extract first and last name from the form (it's an I9) and then output the results into a JSON which gets sent out in REL_SUCCESS. Instead it just outputs the PDF file to REL_SUCCESS. Not sure if it's never being read which is causing blank output or I'm writing it out wrong or what.

import java.nio.charset.StandardCharsets
import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
import java.nio.charset.StandardCharsets
def flowFile = session.get()
flowFile = session.write(flowFile, { inputStream, outputStream ->
    try {
        //Load Flowfile contents
        PDDocument document = PDDocument.load(inputStream)
        PDFTextStripperByArea stripper = new PDFTextStripperByArea()
        //Get the first page
        List<PDPage> allPages = document.getDocumentCatalog().getAllPages()
        PDPage page = allPages.get(0)
    } catch (Exception e){
        System.out.println(e.getMessage())
        session.transfer(flowFile, REL_FAILURE)
    }
    //Define the areas to search and add them as search regions
    stripper = new PDFTextStripperByArea()
    Rectangle lname = new Rectangle(25, 226, 240, 15)
    stripper.addRegion("lname", lname)
    Rectangle fname = new Rectangle(276, 226, 240, 15)
    stripper.addRegion("fname", fname)
    //Load the results into a JSON
    def boxMap = [:]
    stripper.setSortByPosition(true)
    stripper.extractRegions(page)
    regions = stripper.getRegions()
    for (String region : regions) {
        String box = stripper.getTextForRegion(region)
        boxMap.put(region, box)
    }
    Gson gson = new Gson()
    //Remove random noise from the output
    json = gson.toJson(boxMap, LinkedHashMap.class)
    json = json.replace('\\n', '')
    json = json.replace('\\r', '')
    json = json.replace(',"', ',\n"')
    //Overwrite flowfile contents with JSON
    outputStream.write(json.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)

Help appreciated!

1 ACCEPTED SOLUTION

avatar
Contributor

This issue was caused by me not using try/catch properly. Since the files weren't visible to the rest of my code outside the try/catch, it was returning the PDF.

View solution in original post

1 REPLY 1

avatar
Contributor

This issue was caused by me not using try/catch properly. Since the files weren't visible to the rest of my code outside the try/catch, it was returning the PDF.