Member since
05-02-2018
27
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
710 | 08-09-2018 05:09 PM |
10-28-2020
01:21 PM
This issue appears to be related to https://issues.apache.org/jira/browse/NIFI-4417 I also tried using UpdateAttribute to create my regex in an attribute, then use the attribute as the Search Value in ReplaceText, but that appears to have the same issue of NiFi attributes not getting evaluated properly in the Search Value.
... View more
10-28-2020
08:44 AM
I'm trying to use ReplaceText to remove x number of lines from the top of a flowfile based on a flowfile attribute. I'm using the following regex but ReplaceText says its invalid: ^(.*?\n){${skip_lines}} It seems like I should be able to reference a flowfile attribute from regex according to this question but I just get an error. Any idea how I should be doing this? My full config is below:
... View more
Labels:
- Labels:
-
Apache NiFi
07-11-2020
09:11 AM
I now see that 'Infer Schema' is an option in Record readers, so this processor is no longer needed. Leaving this up so others might find it.
... View more
07-11-2020
07:05 AM
I feel a little silly asking because I can't find anything about this on the internet, but was InferAvroSchema removed from NiFi 1.11? My organization recently upgraded our NiFi version and I noticed it was missing, but figured it was something they had been messing with. However I upgraded my home server's NiFi and I notice its missing from there too. I'm hoping that it was replaced by another processor or something? I really use this a lot.
... View more
- Tags:
- NiFi
Labels:
- Labels:
-
Apache NiFi
05-01-2019
05:16 PM
Was able to solve this by removing my usage of arrays from my schema
... View more
04-18-2019
04:38 PM
I have this XML file: <request>
<requestType>BULKRETRIEVE</requestType>
<requestDomainType>ROI</requestDomainType>
<systemName>SYSTEMTEST</systemName>
<location>USA</location>
<userInformation>
<userId>1313</userId>
<firstName>Some</firstName> <!-- required -->
<lastName>Guy</lastName> <!-- required -->
<email>email@address.com</email> <!-- required if phone not included -->
<phone></phone> <!-- required if email not included -->
</userInformation>
<requestObject>
<startDate>2019-01-01T00:00:00.000-05:00</startDate>
<endDate>2019-01-31T00:00:00.000-05:00</endDate>
<type>ROI</type>
</requestObject>
</request> Using this schema {
"namespace": "com.organization.somethingspecific",
"name": "request",
"type": "record",
"fields": [
{"name": "requestType", "type": ["string","null"], "default": null},
{"name": "requestDomainType", "type": ["string","null"], "default": null},
{"name": "systemName", "type": ["string","null"], "default": null},
{"name": "location", "type": ["string","null"], "default": null},
{"name": "userInformation", "type": ["null", {
"name": "userInformation", "type": "array", "items": {
"name": "userInformation", "type": "record", "fields": [
{"name": "userId", "type": ["string","null"], "default": null},
{"name": "firstName", "type": ["string","null"], "default": null},
{"name": "lastName", "type": ["string","null"], "default": null},
{"name": "email", "type": ["string","null"], "default": null},
{"name": "phone", "type": ["string","null"], "default": null}
]
}
}], "default": null},
{"name": "requestObject", "type": ["null",{
"name": "requestObject", "type": "array", "items": {
"name": "requestObject", "type": "record", "fields": [
{"name": "startDate", "type": ["string","null"], "default": null},
{"name": "endDate", "type": ["string","null"], "default": null},
{"name": "type", "type": ["string","null"], "default": null}
]
}
}], "default": null}
]
} I am confident that the schema is correct because I am able to convert from XML to JSON no problem. However UpdateRecord is not able to alter nested fields no matter how I reference them. I have the following UpdateRecord processor: At the very least I would expect //startDate to work since that is SUPPOSED to ignore hierarchy, but I can only update top level fields such as /requestType which is the only one that actually updates of the 4 in my configuration. I am following the documentation exactly, what else could be wrong? Any help appreciated.
... View more
Labels:
- Labels:
-
Apache NiFi
11-29-2018
06:57 PM
I figured there had to be a better way to do that, thanks @Matt Burgess! Is there documentation on the different programming language API's that I'm missing? I've been working off of your excellent ExecuteScript cookbooks posted here, but beyond that I couldn't find in the documentation where I could have looked up something like session.remove().
... View more
11-28-2018
05:55 PM
I'm trying to read a JSON from `flowFile` and add the contents as attribute keys in the empty `updated_flowFile`, but I get `transfer relationship not specified` even though I'm specifying it. from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import InputStreamCallback
from org.apache.nifi.processor.io import OutputStreamCallback
import json
data = {}
# Read contents of flowFile and write contents to data{}
class PyInputStreamCallback(InputStreamCallback):
def __init__(self):
pass
def process(self, inputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
global data
data = json.loads(text)
# Get incoming flowFile and call PyInputStreamCallback
flowFile = session.get()
if (flowFile != None):
try:
session.read(flowFile, PyInputStreamCallback())
global data
# Create a blank flowfile, update the attributes with contents of data{} and and write it to session
updated_flowFile = session.create()
updated_flowFile = session.putAttribute(updated_flowFile, 'left', data['left'])
updated_flowFile = session.putAttribute(updated_flowFile, 'top', data['top'])
session.close(flowFile)
session.transfer(updated_flowFile, REL_SUCCESS)
except:
session.close(updated_flowFile)
session.transfer(flowFile, REL_FAILURE)
else:
session.transfer(flowFile, REL_FAILURE) Alternatively, if there's a way to use the same flowFile object and wipe the JSON contents that would work too. I am doing a mergeContent later in my pipeline so I need the contents to be totally empty except for the attributes I'm adding.
... View more
Labels:
- Labels:
-
Apache NiFi
08-09-2018
05:09 PM
This issue was caused by me not using try/catch properly. Since the files weren't visible to the rest of my code outside the try/catch, it was returning the PDF.
... View more
08-08-2018
08:36 PM
I found Matt's cookbooks and I'm following the recipe for overwriting a FlowFile. It seems very simple and straightforward and I'm not sure what I'm missing. My code is supposed to read the PDF in from the FlowFile, use PDFBox to extract first and last name from the form (it's an I9) and then output the results into a JSON which gets sent out in REL_SUCCESS. Instead it just outputs the PDF file to REL_SUCCESS. Not sure if it's never being read which is causing blank output or I'm writing it out wrong or what. import java.nio.charset.StandardCharsets
import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
import java.nio.charset.StandardCharsets
def flowFile = session.get()
flowFile = session.write(flowFile, { inputStream, outputStream ->
try {
//Load Flowfile contents
PDDocument document = PDDocument.load(inputStream)
PDFTextStripperByArea stripper = new PDFTextStripperByArea()
//Get the first page
List<PDPage> allPages = document.getDocumentCatalog().getAllPages()
PDPage page = allPages.get(0)
} catch (Exception e){
System.out.println(e.getMessage())
session.transfer(flowFile, REL_FAILURE)
}
//Define the areas to search and add them as search regions
stripper = new PDFTextStripperByArea()
Rectangle lname = new Rectangle(25, 226, 240, 15)
stripper.addRegion("lname", lname)
Rectangle fname = new Rectangle(276, 226, 240, 15)
stripper.addRegion("fname", fname)
//Load the results into a JSON
def boxMap = [:]
stripper.setSortByPosition(true)
stripper.extractRegions(page)
regions = stripper.getRegions()
for (String region : regions) {
String box = stripper.getTextForRegion(region)
boxMap.put(region, box)
}
Gson gson = new Gson()
//Remove random noise from the output
json = gson.toJson(boxMap, LinkedHashMap.class)
json = json.replace('\\n', '')
json = json.replace('\\r', '')
json = json.replace(',"', ',\n"')
//Overwrite flowfile contents with JSON
outputStream.write(json.getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS) Help appreciated!
... View more
Labels:
- Labels:
-
Apache NiFi
08-08-2018
08:15 PM
Thanks for everything @Matt Burgess I was able to get this going by learning making my code more Groovy and cutting the need for classes and the main() method out of my implementation.
... View more
08-08-2018
05:13 PM
Hey @Matt Burgess that worked, thanks! I'm trying to scale up now and when I try adding that code to a class and calling it from main() I get errors about static keyword and context. I've tried running it from a run() method and then calling that from main, moving the flowFile declaration outside of main but I'm just not understanding. Sorry to be such a bother I just can't find this in the documentation or examples of doing this from a class. import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
class nocr {
static void main(String args) {
def flowFile = session.get()
if (!flowFile)
return
try {
def inputStream = session.read(flowFile)
PDDocument document = PDDocument.load(inputStream)
PDFTextStripperByArea stripper = newPDFTextStripperByArea()
// Do your other stuff here, probably writing something out to flow file(s)?
inputStream.close()
// If you changed the original flow file, transfer it here
session.transfer(flowFile, REL_SUCCESS)
} catch (
Exception whatever
) {
print(whatever)
// Something went wrong, send the original flow file to failure
session.transfer(flowFile, REL_FAILURE)
}
println('it worked')
}
}
... View more
08-07-2018
06:35 PM
Hey @Matt Burgess thanks for such a quick response. I've been messing with your code for a little bit but can't get past the error `MissingPropertyException: No such property: flowFile for class: Script58238`. It still doesn't seem to be getting the flowFile correctly.
... View more
08-07-2018
05:52 PM
I'm trying to read a PDF from my flowfile into a groovy/java File object, not sure what I'm doing wrong but when I try to use session.get() I get the error `transfer relationship not specified` import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
try {
//Get flowfile into File() object
File file = session.get()
PDDocument document = PDDocument.load(file)
PDFTextStripperByArea stripper = new PDFTextStripperByArea()
} catch (Exception whatever) {
print(whatever)
}
println('it worked')
I've also tried using stdin instead of relying on the session object but that just hangs. I let it spin for about 5 minutes trying to read an 85 kB PDF file before I gave up. import org.apache.pdfbox.io.IOUtils
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripperByArea
import java.awt.Rectangle
import org.apache.pdfbox.pdmodel.PDPage
import com.google.gson.Gson
try {
//Get flowfile into File() object
File file = new File()
OutputStream os = new FileOutputStream(file)
IOUtils.copy(System.in, os)
os.close()
PDDocument document = PDDocument.load(file)
PDFTextStripperByArea stripper = new PDFTextStripperByArea()
} catch (Exception whatever) {
print(whatever)
}
println('it worked') It doesn't really matter which of these works, just as long as I can get the FlowFile into a Java File object so I can continue processing. Any help appreciated I'm extremely frustrated.
... View more
Labels:
- Labels:
-
Apache NiFi
07-24-2018
11:05 PM
I need to replace all of the contents of my flowfile with a file on the disk. Seems like FetchFile just adds the contents to the flowfile, is there a way to drop the contents of my flowfile first?
... View more
Labels:
- Labels:
-
Apache NiFi
05-23-2018
03:31 PM
Nice thanks, I figured there had to be a way to tell it that it was a solo node but I just wasn't phrasing it right for google apparently. Though the problem ended up being solved with a simple delete/reinstall.
... View more
05-23-2018
03:30 PM
1 Kudo
That's how I ended up solving it, must have been a fluke
... View more
05-03-2018
02:58 PM
Not doing any tutorials, just trying to get to nifi at my_server:9090/nifi from my web browser and I get the error stated above
... View more
05-03-2018
02:58 PM
Thanks but I saw that before I posted, but since he's got more than 1 node in his cluster his error makes sense. I am trying to get 1 node to connect to itself and it can't in a vanilla docker install.
... View more
05-02-2018
09:28 PM
I'm getting the error "Cluster is still in the process of voting on the appropriate Data Flow." on the sandbox HDF docker image. Not really sure how to troubleshoot this, I'm not familiar with NiFi or HDF at all. Suggestions appreciated
... View more
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
-
Docker