Member since
11-16-2015
905
Posts
665
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 430 | 09-30-2025 05:23 AM | |
| 765 | 06-26-2025 01:21 PM | |
| 659 | 06-19-2025 02:48 PM | |
| 847 | 05-30-2025 01:53 PM | |
| 11385 | 02-22-2024 12:38 PM |
04-09-2019
02:54 PM
1 Kudo
You can use UpdateRecord for this, but make sure you have the additional fields in your writer's schema. Alternatively you can use JoltTransformJSON with the following spec: [
{
"operation": "default",
"spec": {
"attributes": {
"id": "12233",
"map": "Y"
}
}
}
]
... View more
04-09-2019
01:46 PM
You can use UpdateRecord for this, add a user-defined property called "/year" with a Replacement Strategy of "Literal Value" and a value of 2019. Note that your Record Writer's schema should have the "year" field in it.
... View more
04-08-2019
06:40 PM
You can also try JoltTransformRecord, using the JOLT DSL you can choose which fields you want from the input (and where to put them in the output). As a record-based processor, you can use the XMLReader and JSONRecordSetWriter and it will do the conversion for you.
... View more
04-08-2019
06:06 PM
As of NiFi 1.9.0 (HDF 3.4), the XMLReader can be configured to infer the schema. If you can't upgrade, you could download NiFi 1.9.0 and run it once to infer the schema and write it to an attribute, then inspect the flow file and copy off the schema for use in your operational NiFi instance. There may also be libraries and/or websites that will infer the Avro schema from the XML file for you.
... View more
04-08-2019
01:50 PM
Since you want to change the array of values into key/value pairs, you'll need to put them in an object inside the variables array, so I'm guessing you want a single-element array "variables" containing an object that has the key value pairs. If that's correct, you can use JoltTransformJSON with the following spec, it adds keys for each value in the array based on its order: [
{
"operation": "shift",
"spec": {
"variables": {
"0": "variables[0].username",
"1": "variables[0].active",
"2": "variables[0].temperature",
"3": "variables[0].age"
},
"*": "&"
}
}
] This gave me the following output: {
"id" : 123456,
"ip" : "*",
"t" : -12.9,
"T" : -23.8,
"variables" : [ {
"username" : "user1",
"active" : 0,
"temperature" : 12.97,
"age" : 23
} ]
}
... View more
04-03-2019
06:27 PM
1 Kudo
I believe the flow file is entering the processor and just taking a very long time to process. In the meantime it will show up in the connection on the UI (although if you try to remove it while it's being processed, you will get a message that zero flow files were removed). The indicator that the flow file is being processed is the grid of light/dark dots on the right side of the processor. While that is shown, the processor is executing, ostensibly on one or more flow files from the incoming queue. For your script, I think the reason for the long processing (which I would think would be followed by errors on the processor and in the log?) is because you're reading the entire file into a String, then calling PDDocument.load() on the String, when there is no method for that (you need byte[] or InputStream). The very unfortunate part here is that Groovy will try to print out the value of your String, and for some unknown reason when you call toString() on a PDDocument, it gives the entire content, which for large PDFs you can imagine is quite cumbersome. Luckily you can skip the representation as a String altogether, since the ProcessSession API gives you an InputStream and/or OutputStream, which you can use for load() and save() methods on a PDDocument. I took the liberty of refactoring your script above, mine's not super sophisticated (especially in terms of error handling) but should give you the gist of the approach: import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.multipdf.Splitter
flowFile = session.get()
if(!flowFile) return
def flowFiles = [] as List<FlowFile>
try {
def document
session.read(flowFile, {inputStream ->
document = PDDocument.load(inputStream)
} as InputStreamCallback)
def splitter = new Splitter()
splitter.setSplitAtPage(2)
try {
def forms = splitter.split(document)
forms.each { form ->
newFlowFile = session.write(session.create(flowFile), {outputStream ->
form.save(outputStream)
} as OutputStreamCallback)
flowFiles << newFlowFile
form.close()
}
} catch(e) {
log.error('Error writing splits', e)
throw e
} finally {
document?.close()
}
session.transfer(flowFiles, REL_SUCCESS)
} catch(Exception e) {
log.error('Error processing incoming PDF', e)
session.remove(flowFiles)
}
session.remove(flowFile)
... View more
04-02-2019
01:29 PM
You have specified the SQL Statement property but haven't supplied any values. I recommend replacing PutSQL with PutDatabaseRecord with a Statement Type of INSERT, this should do what you are trying to do.
... View more
04-02-2019
01:27 PM
1 Kudo
You can use MergeContent or MergeRecord for this, it can take flow files each with a single record and combine them together to make a flow file containing many Avro records, then you can use ConvertAvroToParquet or PutParquet.
... View more
03-29-2019
07:20 PM
That's a great idea, thanks! I've been meaning to update it, hopefully sooner than later 🙂
... View more
03-26-2019
01:58 PM
Once a flow file has been created in a session, it must be removed or transferred before the session is committed (which happens at the end of ExecuteScript). Since your try is outside the loop that creates new flow files, you'll want to remove all the created ones, namely the flowFiles list. You can do that with simply: session.remove(flowFiles) rather than the loop you have in your catch statement.
... View more