<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Apache Nifi - Clickhouse : 650 million records from 9 million in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Clickhouse-650-million-records-from-9-million/m-p/413502#M254095</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/127211"&gt;@NadirHamburg&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for being part of our Community.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not an expert on Clickhouse, but was reading that it could be something on the DB causing the batches to repeat and causing that amount of duplicated records.&amp;nbsp;&lt;/P&gt;&lt;P&gt;From NiFi side, you can try to set the batch size at the same amount of records, this should work for you. But I know that for big databases it could be a problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;From Clickhouse, I found this documentation:&amp;nbsp;&lt;BR /&gt;&lt;A href="https://clickhouse.com/docs/engines/table-engines/mergetree-family" target="_blank"&gt;https://clickhouse.com/docs/engines/table-engines/mergetree-family&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;There talks about&amp;nbsp;ReplicatedMergeTree, which should be a good option to avoid duplicates.&amp;nbsp;&lt;BR /&gt;Do you have your table with those settings?&lt;BR /&gt;Do you see any errors on&amp;nbsp;PutDatabaseRecord log? If so, can you share them?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Feb 2026 14:58:42 GMT</pubDate>
    <dc:creator>vafs</dc:creator>
    <dc:date>2026-02-04T14:58:42Z</dc:date>
    <item>
      <title>Apache Nifi - Clickhouse : 650 million records from 9 million</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Clickhouse-650-million-records-from-9-million/m-p/409370#M252835</link>
      <description>&lt;P&gt;I need to copy a MS SQL table with about 9 million records to a Clickhouse database.&lt;/P&gt;&lt;P&gt;I did setup a QueryDatabaseTable processor to pull the table from the SQL database and a PutDatabaseRecord processor to push the records to Clickhouse db.&lt;/P&gt;&lt;P&gt;As long as the flowfile from the QueryDatabaseTable processor has less records than the setting of Batch Size in the PutDatabaseRecord everything works fine.&lt;/P&gt;&lt;P&gt;But when I have more records the PutDatabaseRecord creates multiple batches.&lt;/P&gt;&lt;P&gt;When I pull 30.000 records from my source table and the Batch Size is set to 10.000 I end up with 60.000 records in the destination.&lt;/P&gt;&lt;P&gt;Taking a look at the debug information from the PutDatabaseRecord it shows 1 insert and 3 insert batches.&lt;/P&gt;&lt;P&gt;When I pull all 9.5 million records I end up with 650 million records in the destination.&lt;/P&gt;&lt;P&gt;Any idea ?&lt;/P&gt;</description>
      <pubDate>Thu, 05 Jun 2025 19:22:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Clickhouse-650-million-records-from-9-million/m-p/409370#M252835</guid>
      <dc:creator>NadirHamburg</dc:creator>
      <dc:date>2025-06-05T19:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: Apache Nifi - Clickhouse : 650 million records from 9 million</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Clickhouse-650-million-records-from-9-million/m-p/413502#M254095</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/127211"&gt;@NadirHamburg&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for being part of our Community.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not an expert on Clickhouse, but was reading that it could be something on the DB causing the batches to repeat and causing that amount of duplicated records.&amp;nbsp;&lt;/P&gt;&lt;P&gt;From NiFi side, you can try to set the batch size at the same amount of records, this should work for you. But I know that for big databases it could be a problem.&amp;nbsp;&lt;/P&gt;&lt;P&gt;From Clickhouse, I found this documentation:&amp;nbsp;&lt;BR /&gt;&lt;A href="https://clickhouse.com/docs/engines/table-engines/mergetree-family" target="_blank"&gt;https://clickhouse.com/docs/engines/table-engines/mergetree-family&lt;/A&gt;&amp;nbsp;&lt;BR /&gt;There talks about&amp;nbsp;ReplicatedMergeTree, which should be a good option to avoid duplicates.&amp;nbsp;&lt;BR /&gt;Do you have your table with those settings?&lt;BR /&gt;Do you see any errors on&amp;nbsp;PutDatabaseRecord log? If so, can you share them?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2026 14:58:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Apache-Nifi-Clickhouse-650-million-records-from-9-million/m-p/413502#M254095</guid>
      <dc:creator>vafs</dc:creator>
      <dc:date>2026-02-04T14:58:42Z</dc:date>
    </item>
  </channel>
</rss>

