About mburgess

mburgess · ‎03-05-2018

QueryDatabaseTable does not support incoming connections, so you wouldn't be able to support multiple tables. The "Table Name" property does support NiFi Expression Language, but that is so you can migrate flows from dev to test to production using different table names, each environment would have its own (static) variable set. Instead, you can use GenerateTableFetch, it supports incoming connections and thus you can use flow file attributes in the expression(s) for Table Name, Columns to Return, Maximum-value Columns, etc. It works like QueryDatabaseTable, but instead of generating and executing the SQL, it only generates the SQL statements. This allows you to send the statements downstream to something like ExecuteSQL, possibly distributing the flow files among nodes in the cluster (using a Remote Process Group -> Input Port, if you have a cluster vs a standalone NiFi instance). You can populate your incoming flow files from wherever you get your configuration (ListFile -> FetchFile if it is a file on disk, ListDatabaseTables if you want to get a list of tables from the database itself).

mburgess · ‎03-02-2018

The JoltTransformJSON processor accepts NiFi Expression Language in the spec, so you can do something like: [ { "operation": "default", "spec": { "newfield": "${my.attr}" } } ] And it will add "newfield" to the top level object, with a string value of whatever is in my.attr. Note that it (currently) has to be enclosed in quotes (and thus must be a string field), otherwise the spec validator will try to parse the EL itself and the processor will be marked invalid. This is a bug that will hopefully be fixed in an upcoming release.

mburgess · ‎03-01-2018

You can use PutDatabaseRecord instead of ReplaceText -> PutSQL, it will take the data itself and generate/execute the necessary SQL for your inserts. It also supports "Rollback on Failure" which should give you the error handling behavior you're looking for. You'll need to configure a RecordReader, and there isn't currently an XML RecordReader (but you can script one, see this template for an example), but however you're using ReplaceText to generate SQL, you could alternatively have it generate JSON or CSV and then configure a JsonTreeReader or CSVReader. The schema would look like the following (with your field names/types in place of the field1/field2 in the example: {"name": "myRecord", "type": "record", "fields": [ {"name": "field1", "type": ["null", "int"]}, {"name": "field2", "type": ["null", "string"]} ]} For a comprehensive example, see Andrew Lim's CDC with Apache NiFi series.

mburgess · ‎03-01-2018

No wait, that won't work. Hmm, it worked in the Jolt preview. Do you have an array of objects at the top level, vs the single object you have in your example?

mburgess · ‎02-28-2018

Sorry about that, I had a copy-paste error, each of those "*...." entries should be followed by a ".&", so "*.*.*.*":"&(0,1)_&(0,2)_&(0,3)_&(0,4).&"

mburgess · ‎02-28-2018

If you knew the schema of the incoming content, I believe you can use schema aliases in conjunction with ConvertRecord, but since you mention field names are unknown, I'm guessing you won't know the schema(s) either 🙂 You can do this with the JoltTransformJSON processor, although I don't think it supports arbitrary numbers of periods as its matching operator is as non-greedy as possible. Here is a spec that works for 1-3 periods: [ { "operation": "shift", "spec": { "*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)", "*.*.*": "&(0,1)_&(0,2)_&(0,3)", "*.*": "&(0,1)_&(0,2)", "*": "&" } } ] Note you could continue this pattern for any discrete number of periods. Also note that the above spec works for "flat" JSON files. For nested fields you'd have to go "one level deeper" and apply the same pattern, here's a spec that works for 1-3 periods, 1-2 fields deep: [ { "operation": "shift", "spec": { "*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)", "*.*.*": "&(0,1)_&(0,2)_&(0,3)", "*.*": "&(0,1)_&(0,2)", "*": { "*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)", "*.*.*": "&(0,1)_&(0,2)_&(0,3)", "*.*": "&1.&(0,1)_&(0,2)", "*": "&" } } } ] If the incoming JSON is truly "wild west", you could use Groovy in an ExecuteScript processor along with a JsonSlurper (and JsonOutput) to change the keys at arbitrary depths with arbitrary numbers of periods.

mburgess · ‎02-28-2018

You can install HDF on an HDP cluster, but be mindful of which services are running on which nodes so you don't get performance issues. To install HDF on a new HDP cluster, you can use these instructions; for an existing HDP cluster, use this.

mburgess · ‎02-28-2018

The scripting processors (ExecuteScript, e.g.) offer Jython, not Python, as a scripting engine. Jython can't use compiled (CPython) modules, or modules whose dependencies include compiled modules. I suspect cx_Oracle or one of the other modules is (or depends on) compiled modules. Since your script uses "print" rather than the NiFi API (see my ExecuteScript Cookbook for examples of the latter), you could use ExecuteProcess or ExecuteStreamCommand to run your script using your native Python interpreter from the command line, the output will become the content of the flow file and should work for your use case.

mburgess · ‎02-27-2018

You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂

mburgess · ‎02-26-2018

What kinds of operations are you trying to perform on the files in the ZIP?

Online	Offline
Last Visited	‎10-29-2025 10:31 AM

Member Since	‎11-16-2015 02:21 PM
Last Visited	‎10-29-2025 10:31 AM
Posts	905
Kudos received	659

Cloudera Community

Re: Compare data within the JSON using NIFI

Re: how to join three csv files like sql on condit...

Re: How to see the Data Provenance and Lineage in ...

Re: Apache NiFi - RouteText has no matches

Re: Nifi Building error when creating a brand new ...

Re: Read Configuration properties in QueryDatabase...

Re: NiFi - convert everything in json to attribute...

Re: Managing multiple insert queries

Re: Dynamically renaming JSON field names in NiFi

Re: Dynamically renaming JSON field names in NiFi

Re: Dynamically renaming JSON field names in NiFi

Re: Can I install HDF on HDP cluster?

Re: NIFI Stored Procedure from Python Execute Scri...

Re: Unzip files in ExecuteScript NiFi processor

Re: Unzip files in ExecuteScript NiFi processor