About mburgess

mburgess · ‎04-30-2018

I wrote up a quick Chain spec you can use in a JoltTransformJSON processor, that way you can skip the Split/Merge pattern and work on the entire JSON object at once: [ { "operation": "shift", "spec": { "Objects": { "*": { "Item": { "Inventory": { "Elements": { "Element": { "*": { "Height": "[&1].Height", "Weight": "[&1].Weight", "Features": { "Feature": { "*": "[&3].&" } } } } } }, "Status": { "ElementsStatus": { "ElementStatus": { "*": { "@(3,Id)": "[&1].Id", "Status": "[&1].Status" } } } } } } } } } ] Note that this assumes the Element and ElementStatus arrays are parallel, meaning the first object in the Element array corresponds to the first object in the ElementStatus array (i.e. their FeatureId fields match). If that is not true, you'd either need a more complicated JOLT spec or perhaps a scripted solution using ExecuteScript.

mburgess · ‎04-26-2018

Your schema says that null values are allowed. If you don't want to allow nulls for particular fields, try a ValidateRecord processor using a schema that does not allow null values for the desired fields. I can't remember whether the "non-null" schema would be set on the Reader or Writer for ValidateRecord, but I believe it is the Reader. In that case, use the current schema (that allows nulls) for the Writer so the valid and invalid records can be output from the processor. Then you can send the "valid" relationship to the Elasticsearch processor, and handle the flowfiles/records on the "invalid" relationship however you choose.

mburgess · ‎04-26-2018

You only need one session per execution of the script. Using that session, you can get, create, remove, and transfer as many flow files as you want. If you get or create a flow file from the session, then you must transfer or remove it before the end of the script, or else you will get a "Transfer relationship not specified" error. Also you can only transfer each flow file once, if you attempt to transfer the same flow file more than once, you will get the error you describe above.

mburgess · ‎04-26-2018

I can't reproduce this, I used GenerateFlowFile with your input XML (adding two Transactions) -> SplitXML (level 1) and got the same "sub-xml" you did, then I used the same settings for EvaluateXPath and my content attribute has the correct value of 1. The only way I got it to show "Empty string set" is when I used /Transaction/@type as the XPath (note the wrong case for Type/type), is it possible there's a typo or case-sensitivity issue between your input XML and the XPath?

mburgess · ‎04-25-2018

I think it was an error in the blog software, seems to be fixed now?

mburgess · ‎04-24-2018

PutDatabaseRecord allows you to put multiple records from one flow file into a database at a time, without requiring the user to convert to SQL (you can use PutSQL for the latter, but it is less efficient). In your case you just need GetFile -> PutDatabaseRecord. Your CSVReader will have the schema for the data, which will indicate the types of the fields to PutDatabaseRecord. It will use that to insert the fields appropriately into the prepared statement and execute the whole flow file as a single batch.

mburgess · ‎04-19-2018

For GetFile it does not matter the structure or format of the file, but it does appear that the NiFi process cannot read from that folder.

mburgess · ‎04-19-2018

It appears you are trying to get files from the /temp folder on the local filesystem, and that folder is not readable by the OS user running NiFi. Can you verify for the owner of the NiFi process that you can read from that folder?

mburgess · ‎04-17-2018

I am working on NIFI-4456 which will allow the JSON reader/writer to support the "one JSON per line" format as well as the "JSON array" format for input and output, so you will be able to read in one JSON per line and output a JSON array, using ConvertRecord (or any other record-aware processor). In the meantime, you can use the following crude script in an ExecuteGroovyScript processor to process your entire file (avoiding the Split/Merge pattern), it should get you what you want: def flowFile = session.get() if(!flowFile) return flowFile = session.write(flowFile, {inStream, outStream -> outStream.write('['.bytes) inStream.eachLine { line, i -> if(i > 1) outStream.write(','.bytes) outStream.write(line.bytes) } outStream.write(']'.bytes) } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) The script just adds array brackets around the whole doc, and separates the lines by a comma. I did the crude version because it doesn't need to load the entire input content into memory. If you need more control over the JSON objects, you could iterate over the lines (still with eachLine), use JsonSlurper to deserialize each string into a JSON object, then add each object to an array, then use JsonOutput to serialize the whole thing back to a string. However that involves having the entire content in memory and could get unwieldy for large input flow files.

mburgess · ‎04-12-2018

As of NiFi 1.5.0 (via NIFI-4684), you can now specify the prefix in ConvertJSONToSQL. The property defaults to "sql" to maintain existing behavior, but can be changed to "hiveql" for use with PutHiveQL.

Online	Offline
Last Visited	‎01-16-2026 01:45 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎01-16-2026 01:45 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: How to Combine Data from Two Flows Based on Co...

Re: How to search and remove null values in elasti...

Re: Execute script groovy error: is not known in t...

Re: NiFi: Extract atrribute value from XML using E...

Re: ExecuteScript Cookbook (part 2)

Re: NIFI load data from CSV to database

Re: GETFILE processor fails with the error "proces...

Re: GETFILE processor fails with the error "proces...

Re: Modifying JSON file

Re: putHiveQL error