Member since
11-16-2015
905
Posts
666
Kudos Received
249
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 484 | 09-30-2025 05:23 AM | |
| 809 | 06-26-2025 01:21 PM | |
| 738 | 06-19-2025 02:48 PM | |
| 911 | 05-30-2025 01:53 PM | |
| 11628 | 02-22-2024 12:38 PM |
03-05-2018
02:42 PM
2 Kudos
QueryDatabaseTable does not support incoming connections, so you wouldn't be able to support multiple tables. The "Table Name" property does support NiFi Expression Language, but that is so you can migrate flows from dev to test to production using different table names, each environment would have its own (static) variable set. Instead, you can use GenerateTableFetch, it supports incoming connections and thus you can use flow file attributes in the expression(s) for Table Name, Columns to Return, Maximum-value Columns, etc. It works like QueryDatabaseTable, but instead of generating and executing the SQL, it only generates the SQL statements. This allows you to send the statements downstream to something like ExecuteSQL, possibly distributing the flow files among nodes in the cluster (using a Remote Process Group -> Input Port, if you have a cluster vs a standalone NiFi instance). You can populate your incoming flow files from wherever you get your configuration (ListFile -> FetchFile if it is a file on disk, ListDatabaseTables if you want to get a list of tables from the database itself).
... View more
03-02-2018
09:44 PM
The JoltTransformJSON processor accepts NiFi Expression Language in the spec, so you can do something like: [
{
"operation": "default",
"spec": {
"newfield": "${my.attr}"
}
}
] And it will add "newfield" to the top level object, with a string value of whatever is in my.attr. Note that it (currently) has to be enclosed in quotes (and thus must be a string field), otherwise the spec validator will try to parse the EL itself and the processor will be marked invalid. This is a bug that will hopefully be fixed in an upcoming release.
... View more
03-01-2018
05:09 PM
1 Kudo
You can use PutDatabaseRecord instead of ReplaceText -> PutSQL, it will take the data itself and generate/execute the necessary SQL for your inserts. It also supports "Rollback on Failure" which should give you the error handling behavior you're looking for. You'll need to configure a RecordReader, and there isn't currently an XML RecordReader (but you can script one, see this template for an example), but however you're using ReplaceText to generate SQL, you could alternatively have it generate JSON or CSV and then configure a JsonTreeReader or CSVReader. The schema would look like the following (with your field names/types in place of the field1/field2 in the example: {"name": "myRecord", "type": "record", "fields": [
{"name": "field1", "type": ["null", "int"]},
{"name": "field2", "type": ["null", "string"]}
]} For a comprehensive example, see Andrew Lim's CDC with Apache NiFi series.
... View more
03-01-2018
12:00 AM
No wait, that won't work. Hmm, it worked in the Jolt preview. Do you have an array of objects at the top level, vs the single object you have in your example?
... View more
02-28-2018
11:58 PM
Sorry about that, I had a copy-paste error, each of those "*...." entries should be followed by a ".&", so "*.*.*.*":"&(0,1)_&(0,2)_&(0,3)_&(0,4).&"
... View more
02-28-2018
09:58 PM
1 Kudo
If you knew the schema of the incoming content, I believe you can use schema aliases in conjunction with ConvertRecord, but since you mention field names are unknown, I'm guessing you won't know the schema(s) either 🙂 You can do this with the JoltTransformJSON processor, although I don't think it supports arbitrary numbers of periods as its matching operator is as non-greedy as possible. Here is a spec that works for 1-3 periods: [
{
"operation": "shift",
"spec": {
"*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)",
"*.*.*": "&(0,1)_&(0,2)_&(0,3)",
"*.*": "&(0,1)_&(0,2)",
"*": "&"
}
}
] Note you could continue this pattern for any discrete number of periods. Also note that the above spec works for "flat" JSON files. For nested fields you'd have to go "one level deeper" and apply the same pattern, here's a spec that works for 1-3 periods, 1-2 fields deep: [
{
"operation": "shift",
"spec": {
"*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)",
"*.*.*": "&(0,1)_&(0,2)_&(0,3)",
"*.*": "&(0,1)_&(0,2)",
"*": {
"*.*.*.*": "&(0,1)_&(0,2)_&(0,3)_&(0,4)",
"*.*.*": "&(0,1)_&(0,2)_&(0,3)",
"*.*": "&1.&(0,1)_&(0,2)",
"*": "&"
}
}
}
] If the incoming JSON is truly "wild west", you could use Groovy in an ExecuteScript processor along with a JsonSlurper (and JsonOutput) to change the keys at arbitrary depths with arbitrary numbers of periods.
... View more
02-28-2018
04:16 PM
2 Kudos
You can install HDF on an HDP cluster, but be mindful of which services are running on which nodes so you don't get performance issues. To install HDF on a new HDP cluster, you can use these instructions; for an existing HDP cluster, use this.
... View more
02-28-2018
12:50 PM
1 Kudo
The scripting processors (ExecuteScript, e.g.) offer Jython, not Python, as a scripting engine. Jython can't use compiled (CPython) modules, or modules whose dependencies include compiled modules. I suspect cx_Oracle or one of the other modules is (or depends on) compiled modules. Since your script uses "print" rather than the NiFi API (see my ExecuteScript Cookbook for examples of the latter), you could use ExecuteProcess or ExecuteStreamCommand to run your script using your native Python interpreter from the command line, the output will become the content of the flow file and should work for your use case.
... View more
02-27-2018
12:34 AM
1 Kudo
You know the path of the XML doc? I'm still looking at memory vs temp disk storage, if you can fill in this blank I hope to have an answer for you (in Groovy probably lol) tomorrow 🙂
... View more
02-26-2018
10:32 PM
What kinds of operations are you trying to perform on the files in the ZIP?
... View more