Support Questions

Find answers, ask questions, and share your expertise
Announcements
We’ve updated our product names and community labels - click here for full details

Apache Nifi - Clickhouse : 650 million records from 9 million

avatar

I need to copy a MS SQL table with about 9 million records to a Clickhouse database.

I did setup a QueryDatabaseTable processor to pull the table from the SQL database and a PutDatabaseRecord processor to push the records to Clickhouse db.

As long as the flowfile from the QueryDatabaseTable processor has less records than the setting of Batch Size in the PutDatabaseRecord everything works fine.

But when I have more records the PutDatabaseRecord creates multiple batches.

When I pull 30.000 records from my source table and the Batch Size is set to 10.000 I end up with 60.000 records in the destination.

Taking a look at the debug information from the PutDatabaseRecord it shows 1 insert and 3 insert batches.

When I pull all 9.5 million records I end up with 650 million records in the destination.

Any idea ?

1 REPLY 1

avatar
Master Collaborator

Hello @NadirHamburg 

Thanks for being part of our Community. 

I'm not an expert on Clickhouse, but was reading that it could be something on the DB causing the batches to repeat and causing that amount of duplicated records. 

From NiFi side, you can try to set the batch size at the same amount of records, this should work for you. But I know that for big databases it could be a problem. 

From Clickhouse, I found this documentation: 
https://clickhouse.com/docs/engines/table-engines/mergetree-family 
There talks about ReplicatedMergeTree, which should be a good option to avoid duplicates. 
Do you have your table with those settings?
Do you see any errors on PutDatabaseRecord log? If so, can you share them? 


Regards,
Andrés Fallas
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs-up button.