About mburgess

mburgess · ‎05-13-2017

In a JOLT spec, if you don't explicitly provide a transformation for a particular field, it will be excluded. So you can include matching rules for the fields you care about (i.e. those that have a certain value), the rest will be discarded. Check the "Filter data from an Array, based on a leaf level value" example at the JOLT Demo online app.

mburgess · ‎05-13-2017

As of NiFi 1.2.0, after GetFile you can use the PutDatabaseRecord processor with a CSVReader, giving it a schema if you know the layout, or if the CSV file has a header row you can get the column names from that using "Use String Fields From Header" for the Schema Access Strategy property. Prior to NiFi 1.2.0, after GetFile you probably want a SplitText to get each row in its own flow file. Do you know the number of columns for the CSV file? If so then you can use a regex in ExtractText; assuming there were four columns you might have a dynamic property called "column" set to something like: ([^,]+),([^,]+),([^,]+),([^,]+) That should give you attributes "column.1", "column.2", "column.3", and "column.4". Then you can use ReplaceText to generate a SQL statement, perhaps something like: INSERT INTO myTable VALUES(${column.1},${column.2},${column.3},${column.4}) Then send that to PutSQL. Another option (prior to NiFi 1.2.0) is to convert the CSV to Avro (see this HCC article), then ConvertAvroToJSON, then ConvertJSONToSQL, then PutSQL.

mburgess · ‎05-11-2017

Since a template is XML, you could use an XSLT to replace the values for those properties. Alternatively, scripting languages such as Python and Groovy can handle XML fairly easily, you could write a script to replace the values. Ideally these properties would support Expression Language, I have written NIFI-3867 to cover this improvement.

mburgess · ‎05-10-2017

There are too many IOUtils.toString() calls there, the "text" line should read: text = IOUtils.toString(inputStream, StandardCharsets.ISO_8859_1))

mburgess · ‎05-09-2017

Try the following for the XPath: string(/queryResponse/@last) Also ensure the Destination property is "flowfile-content" and Return is "string", this will ensure the value of the attribute is written as the contents of the outgoing flow file.

mburgess · ‎05-03-2017

Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.

mburgess · ‎04-24-2017

The "Out" number is the (5 minute rolling window) amount of data (count of flow files / size of flow files) that the processor has transferred (not that is queued). Check the Anatomy of a Processor section of the NiFi User's Guide, it has explanations of the statistics and other indicators on a processor.

mburgess · ‎04-24-2017

If the JSON content is not too large to fit in memory, you could use ExecuteScript for this, Groovy has an XmlSlurper that can parse your XML clob (assuming it has been placed in an attribute via EvaluateJsonPath), and a JsonSlurper (and JsonOutput) that can read/write JSON as objects. For example, given the input: { "key": "k1", "clob": "<root><attribute><name>attr1</name><value>Hello</value></attribute><attribute><name>attr2</name><value>World!</value></attribute></root>" } You could use the following Groovy script in ExecuteScript: import org.apache.commons.io.IOUtils import java.nio.charset.* import groovy.json.* import groovy.util.* def flowFile = session.get() if (!flowFile) return try { flowFile = session.write(flowFile, { inputStream, outputStream -> def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) // Parse JSON into object def json = new JsonSlurper().parseText(text) // Parse XML from clob field into object def xml = new XmlSlurper().parseText(json.clob) // Add a field to the JSON for each "attribute" tag in the XML xml.attribute.each { a -> json[a.name.toString()] = a.value.toString() } // Remove the clob field json.remove('clob') // Write the updated JSON object as the flow file content outputStream.write(JsonOutput.prettyPrint(JsonOutput.toJson(json)).getBytes(StandardCharsets.UTF_8)) } as StreamCallback) flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').tokenize('.')[0]+'_with_clob_fields.json') session.transfer(flowFile, REL_SUCCESS) } catch(Exception e) { log.error('Error extracting XML fields into JSON', e) session.transfer(flowFile, REL_FAILURE) } For the given input, it generates the following output: { "attr1": "Hello", "attr2": "World!", "key": "k1" } The script extracts the XML text from the "clob" field, then parses it into an object with XmlSlurper, then finds the individual "attribute" tags within, and adds each name/value pair to the original JSON object. For instances where the clob is not too large, it might be helpful to have an "xPath()" or "xmlPath" function in NiFi Expression Language (like the jsonPath() function added in NIFI-1660). Please feel free to file a Jira case to add this feature.

mburgess · ‎04-24-2017

If a single flow file contains an array and you want to manipulate values within, then @Andy LoPresto's solution is recommended. From your comment on his answer it appears you want to compute the average across multiple flow files. From a flow perspective, how would you know when you were "done" calculating the average? Will you have a running average that is calculated from sum-so-far and count-so-far? Or do you want to take X flow files in, calculate the average, then output the X flow files (or perhaps a single one) with the average for those X flow files? NiFi 1.2.0 (having implemented NIFI-1582) will include the ability to store and calculate state using UpdateAttribute. This can be used to maintain "sum" and "count" attributes, which at any given point would let you calculate the running average. In the meantime (or alternatively), you could use ExecuteScript or InvokeScriptedProcessor to perform this same function. It would be similar to Andy's approach, but would also store the sum-so-far and count-so-far into the processor's State Map. If you are calculating a running average and want to output each flow file as it comes in (adding a "current average" attribute for example), you can use ExecuteScript. If you want to keep the incoming flow files until a total average can be calculated, then you'd need InvokeScriptedProcessor.

mburgess · ‎04-24-2017

Using JoltTransformJSON, you can inject the "key" and "theme" entries from p into the array, and create a top-level array from it. Try the following Shift spec: { "operation": "shift", "spec": { "s": { "*": { "@(2,p)": { "key": "[#3].key", "theme": "[#3].theme" }, "*": "[#2].&" } } } } Given your input, it produces the following output: [ { "key" : "k1", "theme" : "default", "x" : 1, "y" : "0.1" }, { "key" : "k1", "theme" : "default", "x" : 2, "y" : "0.2" } ] Now you can use a SplitJson (with a JSON Path of $) to get the individual records. If you want to keep them as JSON then you're done; if you want to convert it to CSV, you'd need EvaluateJsonPath like @Timothy Spann mentioned, then ReplaceText with Expression Language to set the fields, something like "${key}, ${theme}, ${x}, ${y}". An alternative to Jolt, which Tim alluded to, is to use EvaluateJsonPath to get the p.* fields into attributes, then SplitJson, then EvaluateJsonPath to get the s.* attributes, then ReplaceText (either generating JSON or CSV as described). In my opinion I'd use the Jolt transform and keep the content in JSON as long as possible, rather than generating attributes.

Online	Offline
Last Visited	‎12-03-2025 12:10 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎12-03-2025 12:10 PM
Posts	911
Kudos received	661

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: jolttransformjson - transform some elements bu...

Re: please suggest me steps,how i can insert csv f...

Re: Hive DHCP : Using external property file

Re: NiFi ExecuteScript Processor: error using stri...

Re: How to use EvaluateXPath to get xml root's att...

Re: manipulate CSV flowfiles with ExecuteScript p...

Re: Understanding Nifi Processors fields

Re: parse the xml within an attribute

Re: Apache Nifi - How to calculate SUM or AVERAGE ...

Re: NIFI - How to split non root node (json array)...