Support Questions

Find answers, ask questions, and share your expertise

Bulk Load SQL Server data into MySQL apache Nifi

avatar
Contributor

I have billions of rows in SQL Server tables and I'm using Nifi to load those rows into MySQL. I tried using putSQL and PutDatabaseRecord both are satisfying the requirement, however they are taking quite long time to load the into MySQL(100k records per minute, for 1 billion it might be 1000+ minutes) as they are doing it record by record. Do we have any bulk load option to load the AVRO/CSV flowfiles into MySQL in Nifi. Could you please suggest an option.

Here is my Flow :

ListDataBaseTables -> GenerateTableFecth(partition size= 50k records) -> Execute SQL -> ConvertRecord(Avro to CSV) -> PutSQL

ListDataBaseTables -> GenerateTableFecth(partition size= 50k records) -> Execute SQL(Avro) -> PutDatabaseRecord.

9 REPLIES 9

avatar
Rising Star

If possible, I recommend MySQL 'LOAD DATA' command instead of putSQL and PutDatabaseRecord.

NiFi save csv --> MySQL local CSV --> MySQL LOAD DATA ...

MySQL LOAD DATA is dramatically fast !

avatar
Contributor

@youngick kim,

Could you please elaborate your answer(NiFi save csv --> MySQL local CSV). MySQL local CSV, I didn't get this one.

avatar
Rising Star

I mean ...

ListDataBaseTables -> GenerateTableFecth -> Execute SQL -> ConvertRecord(Avro to CSV) -> CSV to MySQL server local file -> ExcecuteProcess(MySQL LOAD DATA... command)

avatar
Contributor

MySQL Server and Nifi Server are not running on the same machine. Did you mean copying the flow files into local server and from there LOAD DATA?

avatar
Rising Star

You can use ExcecuteProcess for SCP or GetSFTP...

avatar
Contributor

If I understood correctly, it's like writing the file to disk and using that file as part of load data infile command? Did my understanding correct?

avatar
Rising Star

Yes, You did ! ^^

avatar
Contributor

Writing flowfiles to disk will be costly operation as the size of the flow files is huge and there are lots of such flow files. Any alternative solution apart from writing contents of flowfiles to disk?

avatar
Rising Star

Writing flowfiles to disk will be costly operation as the size of the flow files is huge and there are lots of such flow files. Any alternative solution apart from writing contents of flowfiles to disk?

I think you should select the better solution(as I know, there is no best) because both of them have the trade-off.

Your questioned solution has no file operation and no writing contents of flowfiles to disk.

But, very slo~~~w !

My solution has costly file operation and writing contents of flowfiles to disk.

But, Fast !