Member since
11-16-2015
902
Posts
664
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 231 | 09-30-2025 05:23 AM | |
| 656 | 06-26-2025 01:21 PM | |
| 497 | 06-19-2025 02:48 PM | |
| 748 | 05-30-2025 01:53 PM | |
| 10964 | 02-22-2024 12:38 PM |
05-15-2017
01:40 PM
1 Kudo
I am having trouble importing the "etree" module, I have tried with brew-installed Python 2.7 and Anaconda 2.7 (where I believe the etree submodule is part of "xml" not "lxml"). Do I need any additional configuration? Looking in the lxml package, I see some native libraries (.so files, e.g.). If lxml is a native library, Jython (the "python" script engine in ExecuteScript) will not be able to load/execute it. All imported modules (and their dependencies) must be pure Python (no native code like CPython for example) for Jython to execute the script successfully. Perhaps there is a different library you can use? If you don't have a requirement on Jython/Python, consider using Javascript, Groovy, or Clojure instead. Their Module Directory allows you to use third-party Java libraries to accomplish this conversion, such as NekoHTML, JTidy, or JSoup.
... View more
05-13-2017
02:38 AM
In a JOLT spec, if you don't explicitly provide a transformation for a particular field, it will be excluded. So you can include matching rules for the fields you care about (i.e. those that have a certain value), the rest will be discarded. Check the "Filter data from an Array, based on a leaf level value" example at the JOLT Demo online app.
... View more
05-13-2017
02:33 AM
As of NiFi 1.2.0, after GetFile you can use the PutDatabaseRecord processor with a CSVReader, giving it a schema if you know the layout, or if the CSV file has a header row you can get the column names from that using "Use String Fields From Header" for the Schema Access Strategy property. Prior to NiFi 1.2.0, after GetFile you probably want a SplitText to get each row in its own flow file. Do you know the number of columns for the CSV file? If so then you can use a regex in ExtractText; assuming there were four columns you might have a dynamic property called "column" set to something like: ([^,]+),([^,]+),([^,]+),([^,]+) That should give you attributes "column.1", "column.2", "column.3", and "column.4". Then you can use ReplaceText to generate a SQL statement, perhaps something like: INSERT INTO myTable VALUES(${column.1},${column.2},${column.3},${column.4}) Then send that to PutSQL. Another option (prior to NiFi 1.2.0) is to convert the CSV to Avro (see this HCC article), then ConvertAvroToJSON, then ConvertJSONToSQL, then PutSQL.
... View more
05-11-2017
03:17 PM
1 Kudo
Since a template is XML, you could use an XSLT to replace the values for those properties. Alternatively, scripting languages such as Python and Groovy can handle XML fairly easily, you could write a script to replace the values. Ideally these properties would support Expression Language, I have written NIFI-3867 to cover this improvement.
... View more
05-10-2017
12:52 PM
There are too many IOUtils.toString() calls there, the "text" line should read: text = IOUtils.toString(inputStream, StandardCharsets.ISO_8859_1))
... View more
05-09-2017
03:27 PM
2 Kudos
Try the following for the XPath: string(/queryResponse/@last) Also ensure the Destination property is "flowfile-content" and Return is "string", this will ensure the value of the attribute is written as the contents of the outgoing flow file.
... View more
05-03-2017
01:15 PM
Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.
... View more
04-24-2017
05:44 PM
2 Kudos
The "Out" number is the (5 minute rolling window) amount of data (count of flow files / size of flow files) that the processor has transferred (not that is queued). Check the Anatomy of a Processor section of the NiFi User's Guide, it has explanations of the statistics and other indicators on a processor.
... View more
04-24-2017
04:54 PM
1 Kudo
If the JSON content is not too large to fit in memory, you could use ExecuteScript for this, Groovy has an XmlSlurper that can parse your XML clob (assuming it has been placed in an attribute via EvaluateJsonPath), and a JsonSlurper (and JsonOutput) that can read/write JSON as objects. For example, given the input: {
"key": "k1",
"clob": "<root><attribute><name>attr1</name><value>Hello</value></attribute><attribute><name>attr2</name><value>World!</value></attribute></root>"
} You could use the following Groovy script in ExecuteScript: import org.apache.commons.io.IOUtils
import java.nio.charset.*
import groovy.json.*
import groovy.util.*
def flowFile = session.get()
if (!flowFile) return
try {
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
// Parse JSON into object
def json = new JsonSlurper().parseText(text)
// Parse XML from clob field into object
def xml = new XmlSlurper().parseText(json.clob)
// Add a field to the JSON for each "attribute" tag in the XML
xml.attribute.each { a ->
json[a.name.toString()] = a.value.toString()
}
// Remove the clob field
json.remove('clob')
// Write the updated JSON object as the flow file content
outputStream.write(JsonOutput.prettyPrint(JsonOutput.toJson(json)).getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').tokenize('.')[0]+'_with_clob_fields.json')
session.transfer(flowFile, REL_SUCCESS)
} catch(Exception e) {
log.error('Error extracting XML fields into JSON', e)
session.transfer(flowFile, REL_FAILURE)
} For the given input, it generates the following output: {
"attr1": "Hello",
"attr2": "World!",
"key": "k1"
} The script extracts the XML text from the "clob" field, then parses it into an object with XmlSlurper, then finds the individual "attribute" tags within, and adds each name/value pair to the original JSON object. For instances where the clob is not too large, it might be helpful to have an "xPath()" or "xmlPath" function in NiFi Expression Language (like the jsonPath() function added in NIFI-1660). Please feel free to file a Jira case to add this feature.
... View more
04-24-2017
04:33 PM
2 Kudos
If a single flow file contains an array and you want to manipulate values within, then @Andy LoPresto's solution is recommended. From your comment on his answer it appears you want to compute the average across multiple flow files. From a flow perspective, how would you know when you were "done" calculating the average? Will you have a running average that is calculated from sum-so-far and count-so-far? Or do you want to take X flow files in, calculate the average, then output the X flow files (or perhaps a single one) with the average for those X flow files? NiFi 1.2.0 (having implemented NIFI-1582) will include the ability to store and calculate state using UpdateAttribute. This can be used to maintain "sum" and "count" attributes, which at any given point would let you calculate the running average. In the meantime (or alternatively), you could use ExecuteScript or InvokeScriptedProcessor to perform this same function. It would be similar to Andy's approach, but would also store the sum-so-far and count-so-far into the processor's State Map. If you are calculating a running average and want to output each flow file as it comes in (adding a "current average" attribute for example), you can use ExecuteScript. If you want to keep the incoming flow files until a total average can be calculated, then you'd need InvokeScriptedProcessor.
... View more