About mburgess

mburgess · ‎02-12-2018

Perhaps we could add "CREATE TABLE" and other DDL commands to ConvertJsonToSQL? The hardest part of the exercise is to infer the types correctly, it's such a difficult problem that there's often a "why bother?" attitude, or the type inference is "good enough" for 80% of the use cases, etc.

mburgess · ‎02-12-2018

NIFI-978 addresses this capability, I have a Pull Request up with the improvement, perhaps it will make it into NiFi 1.6.0.

mburgess · ‎02-12-2018

It is possible with ExecuteScript and Groovy if you don't have a SecurityManager set on the JVM that would prevent Groovy from getting at the ProcessorNode which is a private member variable of the StandardProcessContext. The following script can be used in ExecuteScript to add the parent process group ID as an attribute to an incoming flow file: def flowFile = session.get() if(!flowFile) return processGroupId = context.procNode?.processGroupIdentifier ?: 'unknown' flowFile = session.putAttribute(flowFile, 'processGroupId', processGroupId) session.transfer(flowFile, REL_SUCCESS)

mburgess · ‎02-08-2018

In the article you mention, the ID of the connection is specified as an attribute, then the InvokeHttp processor uses it to create a URL to poll for status. In your case, you'd just need the list of connection IDs, and you can create a flow file for each connection. You can use the "/process-groups/root/connections" REST endpoint to get a list of connections at the root process group. If you want to get connections for child process groups, you'd first need to get the child process groups' IDs and use those in the connections endpoint to get the connections for each process group. That can recurse down through each child process group and may be unwieldy. For this example I'm going to assume you simply want to monitor your queues in the root process group. With InvokeHttp and the aforementioned REST endpoint to get the connections, you'll get a JSON object back with a field called "connections", which is an array of connections. Then you can use SplitJson with a JSONPath of "$.connections", and it will create a flow file for each of the connections. Then you can use EvaluateJsonPath to extract the connection's URL using a JSONPath of "$.uri". Then you can continue using the flow described in the other article, and for each of the flow files, it will retrieve the status for that connection.

mburgess · ‎02-07-2018

Avro Schemas can be confusing the first couple of times you create them 🙂 In your case you could use the following: { "namespace": "nifi", "name": "cesarPipeDelimitedRecord", "type": "record", "fields": [ {"name": "id","type": "string"}, {"name": "sequence","type": "int"}, {"name": "category","type": "int"}, {"name": "text","type": "string"} ] } If you can have missing values, then you can replace the type with a union, for example if "category" can be missing, then its field entry can be {"name": "category","type": ["null","int"]},

mburgess · ‎02-07-2018

You don't have to extract the fields to attributes if you are converting the contents to a different format, instead you can use ConvertRecord with a CSVReader with custom format (a pipe delimiter for instance) and name your fields in the Avro schema. Then in ConvertRecord you can set a JsonRecordSetWriter to convert to JSON. This same approach will work for any supported output format, or you can even write your own ScriptedRecordSetWriter if you need a custom format. If you do need to extract to attributes, you can use ExtractText with a regular expression that matches each field, and you can add user-defined properties to extract the group(s) into their associated attributes (the property name is the field name such as "id" or "sequence", and the value is the grouping expression, perhaps $2, $3, etc.)

mburgess · ‎01-30-2018

What version of NiFi are you using? The timezone parameter was added in NiFi 1.2.0 / HDF 3.0 (NIFI-2908).

mburgess · ‎01-24-2018

So right now it appears you are trying to do validation and extraction at the same time, since you don't want "case 2" to move down the stream. If your new ReplaceText from this comment is more performant than the one from the original question, you can use RouteOnContent first to exclude the files that do not have the required header and footer. Since there will now be two pattern matching processors, you may find that it is less performant, but it's probably worth a try. Another option is ExecuteScript with a fast scripting language like Groovy or Javascript/Nashorn, but the overhead of the interpreted script might be worse than the improvement of looking only for headers/footers rather than a whole regex.

mburgess · ‎01-23-2018

How is your data coming into NiFi? If it is a single flow file with all the rows (such as ExecuteSQL which returns an Avro file with records in it), then you can use SplitAvro and then downstream each flow file can be processed separately, with no looping required. If your input is a text file you can use SplitText, if JSON then SplitJSON, etc. If instead you have a number (say 10), and you need to fetch rows with ids 1-10, you can either use ExecuteSQL and get all rows < 10. If I am misunderstanding your use case and you do need to loop, then after you get your loop variable into an attribute (perhaps with EvaluateJSONPath as you mention), then you can use RouteOnAttribute only to see if it is time to exit the loop (${loopVariable:gt(0)} for example). Otherwise you can use UpdateAttribute to increment or decrement the counter, and send that output back to the beginning of the loop.

mburgess · ‎01-23-2018

You can use either ConvertRecord or ConvertAvroToJSON to convert your incoming Avro data to JSON. If the incoming Avro files do not have a schema embedded in them, then you will have to provide it, either to an AvroReader (for ConvertRecord) or the "Avro schema" property (for ConvertAvroToJSON).

Online	Offline
Last Visited	‎12-03-2025 12:10 PM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎12-03-2025 12:10 PM
Posts	911
Kudos received	662

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Converting JSON to SQL DDL

Re: Nifi ExecuteSQL prepared statements to prevent...

Re: Get the processor-group name in NIFI flow

Re: NiFi queue monitoring

Re: [Nifi] Converting a delimited FlowFile's conte...

Re: [Nifi] Converting a delimited FlowFile's conte...

Re: Converting Epoch to TimeStamp of specific Time...

Re: how to in improce performance of extract text ...

Re: Is there a way to loop through unknown values(...

Re: converting Avro data format into Json in java