Created 07-10-2024 04:47 AM
Hi everyone!
I have json from kafka:
[ {
"before" : {
"old_key1" : old_value1,
"old_key2" : old_value2,
"old_key3" : old_value3,
"old_key4" : old_value4
},
"after" : {
"new_key1" : new_value1,
"new_key2" : new_value2,
"new_key3" : new_value3,
"new_key4" : new_value4
},
"table" : "table_name",
"op" : "u", -- operation ['u-update', 'c-insert', 'd-delete']
"tm" : 1720539441000000000 - timestamp of changes
} ]
I need execute on database sql like:
update table_name
set new_key1=new_value1,
new_key2=new_value2,
new_key3=new_value3,
new_key4=new_value4
where
old_key1=old_value1 and
old_key2=old_value2 and
old_key3=old_value3 and
old_key4=old_value4;
What's the best way to update rows in a database with before and after values by NiFi. I can`t use a groovy script because our team know sql only and if I produce it on groovy, I must support this alone. I have tried create sql by jolt, but all values without quotes and I can`t test a type. I think a have two ways: 1) insert all data to sql and create sql by sql 2) Create new Processor with ability of create the sql (But 🙂 I can do it by groovy)
Created 07-11-2024 02:57 AM
It all depends on the complexity of the data you are working with. If you are taking about data transformation (converting to timestamp, replacing quote, ...etc.) then maybe groovy is the way to go. JSLT has some function that can you help you accomplish this as well like string replace and parse-time functons but Im not sure that is everything. Im not sure where did you get the impression that the nifi community doesnt recommend using groovey and if you find an article about that please share. I think its more of an issue with your Boss not wanting you to do any scripting to avoid not having this supported by others than you.
The processor is there for you to use. Actually there is a dedicated processor for groovey called ExecuteGroovyScript. I think the ExecuteScript processor might get deprecated since its redundant. The only issue that I can find that warns about this processor is the fact the script is getting compiled for every flowfile and that might get expensive and impact the performance if you have a big script and working with large data volume. To avoid running into those scenarios, Nifi provides other alternative like InvokeScriptedProcessor (using groovey as well) or develop your custom processor in java (.nar) where the code is compiled once and done.
The jslt processor also re compiles the script but it uses caching to avoid having to do that every time. In terms which performs better: groovey or jslt? Im not sure and I have never tested but you can do some stress testing and let us know :).
Created on 07-10-2024 06:39 AM - edited 07-10-2024 06:42 AM
Hi @ageMLex ,
Which version of Nifi are you using. If you are using Nifi version 1.19 or higher then you can take advantage of the jslt transformation using the JSLTTransformJson processor. Its another json transformation language that is very powerful and work just like xquery in xml. Unlike jolt the result can be none json like stirng, integer , boolean...etc.
Assuming you have the following json:
[
{
"before": {
"old_key1": "old_value1",
"old_key2": "old_value2",
"old_key3": "old_value3",
"old_key4": "old_value4"
},
"after": {
"new_key1": "new_value1",
"new_key2": "new_value2",
"new_key3": "new_value3",
"new_key4": "new_value4"
},
"table": "table_name",
"op": "u",
"tm": 1720539441000000000
}
]
The Jslt script would look like this:
This will produce a sting like this when you can later use in ExecuteSQL,PUTSQL ..etc:
let table = .[0].table
let oldValues = [for (.[0].before) .key+"="+.value ]
let newValues = [for (.[0].after) .key+"="+.value ]
let sqlUpdate = "update "+$table +" set "+ join($newValues,",") + " where " + join($oldValues," and ")
$sqlUpdate
Note: when using the above scipt in the jslt processor try to have it in one line. For some reason the processor will give syntax error if the script is broken into different lines. I think this is a bug in the processor not accounting correctly for newline.
This will produce the following output:
"update table_name set new_key1=new_value1,new_key2=new_value2,new_key3=new_value3,new_key4=new_value4 where old_key1=old_value1 and old_key2=old_value2 and old_key3=old_value3 and old_key4=old_value4"
You can use ExtractText with the following pattern to get rid of double quotation in the beginning and the end using regex:
^"(.*)"$
For more info about jslt please check this site.
Created 07-10-2024 06:56 AM
Thank you for your replay! Version is 2.0.0-M3
Tagged nifi-2.0.0-M3-RC1
From f2215c6 on branch UNKNOWN.
I will to try your solution tomorrow. Thank you again!
Created 07-11-2024 12:14 AM
@SAMSAL , do you think this way is simpler for production and support than Groovy? As an experiment, I created my solution with Groovy yesterday (four processors: Kafka -> Groovy -> ExtractText -> PutSql). My boss believes that we need only simple NiFi blocks. I have about 120 tables with 2000 columns. I need to convert milliseconds to timestamps, escape quotes, replace nulls with SQL 'NULL', and get a list of primary keys from the database. Can I do this with simple blocks? For me, Groovy is the simplest way. Why does the NiFi community not recommend using Groovy? Maybe I need an API creator script to generate simple block assemblies? What is the best practice in the NiFi world?
Created 07-11-2024 02:57 AM
It all depends on the complexity of the data you are working with. If you are taking about data transformation (converting to timestamp, replacing quote, ...etc.) then maybe groovy is the way to go. JSLT has some function that can you help you accomplish this as well like string replace and parse-time functons but Im not sure that is everything. Im not sure where did you get the impression that the nifi community doesnt recommend using groovey and if you find an article about that please share. I think its more of an issue with your Boss not wanting you to do any scripting to avoid not having this supported by others than you.
The processor is there for you to use. Actually there is a dedicated processor for groovey called ExecuteGroovyScript. I think the ExecuteScript processor might get deprecated since its redundant. The only issue that I can find that warns about this processor is the fact the script is getting compiled for every flowfile and that might get expensive and impact the performance if you have a big script and working with large data volume. To avoid running into those scenarios, Nifi provides other alternative like InvokeScriptedProcessor (using groovey as well) or develop your custom processor in java (.nar) where the code is compiled once and done.
The jslt processor also re compiles the script but it uses caching to avoid having to do that every time. In terms which performs better: groovey or jslt? Im not sure and I have never tested but you can do some stress testing and let us know :).