Support Questions

Find answers, ask questions, and share your expertise

What's the best way to update rows in a database with before and after values by NiFi

avatar
New Contributor

Hi everyone!

I have json from kafka: 

[ {
"before" : {
"old_key1" : old_value1,
"old_key2" : old_value2,
"old_key3" : old_value3,
"old_key4" : old_value4
},
"after" : {
"new_key1" : new_value1,
"new_key2" : new_value2,
"new_key3" : new_value3,
"new_key4" : new_value4
},
"table" : "table_name",
"op" : "u", -- operation ['u-update', 'c-insert', 'd-delete']
"tm" : 1720539441000000000 - timestamp of changes
} ]

I need execute on database sql like:

update table_name
set new_key1=new_value1,
  new_key2=new_value2, 
  new_key3=new_value3,
  new_key4=new_value4
where
  old_key1=old_value1 and 
  old_key2=old_value2 and 
  old_key3=old_value3 and 
  old_key4=old_value4;

 What's the best way to update rows in a database with before and after values by NiFi. I can`t use a groovy script because our team know sql only and if I produce it on groovy, I must support this alone. I have tried create sql by jolt, but all values without quotes and I can`t  test a type.  I think a have two ways: 1) insert all data to sql and create sql by sql 2) Create new Processor with ability of create the sql (But 🙂 I can do it by groovy)

1 ACCEPTED SOLUTION

avatar
Super Guru

It all depends on the complexity of the data you are working with. If you are taking about data transformation (converting to timestamp, replacing quote, ...etc.) then maybe groovy is the way to go. JSLT has some function that can you help you accomplish this as well like string replace and parse-time functons but Im not sure that is everything. Im not sure where did you get the impression that the nifi community doesnt recommend using groovey and if you find an article about that please share. I think its more of an issue with your Boss not wanting you to do any scripting to avoid not having this supported by others than you.

The processor is there for you to use. Actually there is a dedicated processor for groovey called ExecuteGroovyScript. I think the ExecuteScript processor might get deprecated since its redundant. The only issue that  I can find that warns about this processor is the fact the script is getting compiled for every flowfile and that might get expensive and impact the performance if you have a big script and working with large data volume. To avoid running into those scenarios, Nifi provides other alternative like InvokeScriptedProcessor (using groovey as well) or develop your custom processor in java (.nar) where the code is compiled once and done.

The jslt processor also re compiles the script but it uses caching to avoid having to do that every time. In terms which performs better: groovey or jslt? Im not sure and I have never tested but you can do some stress testing and let us know :).

View solution in original post

4 REPLIES 4

avatar
Super Guru

Hi @ageMLex ,

Which version of Nifi are you using. If you are using Nifi version 1.19 or higher then you can take advantage of the jslt transformation using the  JSLTTransformJson processor. Its another json transformation language that is very powerful and work just like xquery in xml. Unlike jolt the result can be none json  like stirng, integer , boolean...etc.

Assuming you have the following json:

 

 

[
	{
		"before": {
			"old_key1": "old_value1",
			"old_key2": "old_value2",
			"old_key3": "old_value3",
			"old_key4": "old_value4"
		},
		"after": {
			"new_key1": "new_value1",
			"new_key2": "new_value2",
			"new_key3": "new_value3",
			"new_key4": "new_value4"
		},
		"table": "table_name",
		"op": "u",
		"tm": 1720539441000000000
	}
]

 

 

The Jslt script would look like this:

This will produce a sting like this when you can later use in ExecuteSQL,PUTSQL ..etc:

 

 

let table = .[0].table 
let oldValues = [for (.[0].before) .key+"="+.value ]
let newValues = [for (.[0].after) .key+"="+.value ]
let sqlUpdate = "update "+$table +" set "+ join($newValues,",") + " where " + join($oldValues," and ")
$sqlUpdate 

 

Note: when using the above scipt in the jslt processor try to have it in one line. For some reason the processor will give syntax error if the script is broken into different lines. I think this is a bug  in the processor not accounting correctly for  newline.

This will produce the following output:

 

 

"update table_name set new_key1=new_value1,new_key2=new_value2,new_key3=new_value3,new_key4=new_value4 where old_key1=old_value1 and old_key2=old_value2 and old_key3=old_value3 and old_key4=old_value4"

 

 

You can use ExtractText with the following pattern to get rid of double quotation in the beginning and the end using regex:

 

 

^"(.*)"$

 

 

 

For more info about jslt please check this site.

avatar
New Contributor

Thank you for your replay! Version is 2.0.0-M3

Tagged nifi-2.0.0-M3-RC1

From f2215c6 on branch UNKNOWN.

I will to try your solution tomorrow. Thank you again!

avatar
New Contributor

@SAMSAL , do you think this way is simpler for production and support than Groovy? As an experiment, I created my solution with Groovy yesterday (four processors: Kafka -> Groovy -> ExtractText -> PutSql). My boss believes that we need only simple NiFi blocks. I have about 120 tables with 2000 columns. I need to convert milliseconds to timestamps, escape quotes, replace nulls with SQL 'NULL', and get a list of primary keys from the database. Can I do this with simple blocks? For me, Groovy is the simplest way. Why does the NiFi community not recommend using Groovy? Maybe I need an API creator script to generate simple block assemblies? What is the best practice in the NiFi world?

avatar
Super Guru

It all depends on the complexity of the data you are working with. If you are taking about data transformation (converting to timestamp, replacing quote, ...etc.) then maybe groovy is the way to go. JSLT has some function that can you help you accomplish this as well like string replace and parse-time functons but Im not sure that is everything. Im not sure where did you get the impression that the nifi community doesnt recommend using groovey and if you find an article about that please share. I think its more of an issue with your Boss not wanting you to do any scripting to avoid not having this supported by others than you.

The processor is there for you to use. Actually there is a dedicated processor for groovey called ExecuteGroovyScript. I think the ExecuteScript processor might get deprecated since its redundant. The only issue that  I can find that warns about this processor is the fact the script is getting compiled for every flowfile and that might get expensive and impact the performance if you have a big script and working with large data volume. To avoid running into those scenarios, Nifi provides other alternative like InvokeScriptedProcessor (using groovey as well) or develop your custom processor in java (.nar) where the code is compiled once and done.

The jslt processor also re compiles the script but it uses caching to avoid having to do that every time. In terms which performs better: groovey or jslt? Im not sure and I have never tested but you can do some stress testing and let us know :).