Support Questions

Find answers, ask questions, and share your expertise

Insert a new row into clickhouse database only when it is not exists

avatar
Explorer

Hello all, 

I am creating a processor group that read from kafka topic and write it to the clickhouse database. I am using stateless mechanism to ensure that when there is a problem during execution, nifi crash, or nifi restarted or clickhouse database return error, kafka offset will not be committed and process will be retry.

rtambun_0-1708666355928.png

Unfortunately clickhouse will create a new row for duplicated message. In order to avoid duplicate message, i would like to check first the database and see if i have duplicated message before processing. Have someone create similiar use case as this one?

 

 

 

1 REPLY 1

avatar

@rtambun , 

I am not quite sure how your data comes out of your Kafka cluster but if a message contains a single row, you could add a LookupRecord before saving the data in your database. With that LookupRecord you will check if those vales are present in your database, with the help of a key and if so, you can send that data into a different flow, otherwise, you save it into your Database. 

If your data does not come as a single message - single record from KAFKA, if you are not processing lots of data, you could try and split your data into single flowfiles (containing a single record) and process them further as stated above.