<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Insert a new row into clickhouse database only when it is not exists in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-row-into-clickhouse-database-only-when-it-is/m-p/383791#M245066</link>
    <description>&lt;P&gt;Hello all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am creating a processor group that read from kafka topic and write it to the clickhouse database. I am using stateless mechanism to ensure that when there is a problem during execution, nifi crash, or nifi restarted or clickhouse database return error, kafka offset will not be committed and process will be retry.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="rtambun_0-1708666355928.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/39765i64EE19B0A4A11DFC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="rtambun_0-1708666355928.png" alt="rtambun_0-1708666355928.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Unfortunately clickhouse will create a new row for duplicated message. In order to avoid duplicate message, i would like to check first the database and see if i have duplicated message before processing. Have someone create similiar use case as this one?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 23 Feb 2024 05:34:04 GMT</pubDate>
    <dc:creator>rtambun</dc:creator>
    <dc:date>2024-02-23T05:34:04Z</dc:date>
    <item>
      <title>Insert a new row into clickhouse database only when it is not exists</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-row-into-clickhouse-database-only-when-it-is/m-p/383791#M245066</link>
      <description>&lt;P&gt;Hello all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am creating a processor group that read from kafka topic and write it to the clickhouse database. I am using stateless mechanism to ensure that when there is a problem during execution, nifi crash, or nifi restarted or clickhouse database return error, kafka offset will not be committed and process will be retry.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="rtambun_0-1708666355928.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/39765i64EE19B0A4A11DFC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="rtambun_0-1708666355928.png" alt="rtambun_0-1708666355928.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Unfortunately clickhouse will create a new row for duplicated message. In order to avoid duplicate message, i would like to check first the database and see if i have duplicated message before processing. Have someone create similiar use case as this one?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Feb 2024 05:34:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-row-into-clickhouse-database-only-when-it-is/m-p/383791#M245066</guid>
      <dc:creator>rtambun</dc:creator>
      <dc:date>2024-02-23T05:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: Insert a new row into clickhouse database only when it is not exists</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-row-into-clickhouse-database-only-when-it-is/m-p/383798#M245073</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/106916"&gt;@rtambun&lt;/a&gt;&amp;nbsp;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not quite sure how your data comes out of your Kafka cluster but if a message contains a single row, you could add a LookupRecord before saving the data in your database. With that LookupRecord you will check if those vales are present in your database, with the help of a key and if so, you can send that data into a different flow, otherwise, you save it into your Database.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;If your data does not come as a single message - single record from KAFKA, if you are not processing lots of data, you could try and split your data into single flowfiles (containing a single record) and process them further as stated above.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Feb 2024 08:50:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Insert-a-new-row-into-clickhouse-database-only-when-it-is/m-p/383798#M245073</guid>
      <dc:creator>cotopaul</dc:creator>
      <dc:date>2024-02-23T08:50:12Z</dc:date>
    </item>
  </channel>
</rss>

