Created 03-21-2018 07:19 AM
I have billions of rows in SQL Server tables and I'm using Nifi to load those rows into MySQL. I tried using putSQL and PutDatabaseRecord both are satisfying the requirement, however they are taking quite long time to load the into MySQL(100k records per minute, for 1 billion it might be 1000+ minutes) as they are doing it record by record. Do we have any bulk load option to load the AVRO/CSV flowfiles into MySQL in Nifi. Could you please suggest an option.
Here is my Flow :
ListDataBaseTables -> GenerateTableFecth(partition size= 50k records) -> Execute SQL -> ConvertRecord(Avro to CSV) -> PutSQL
ListDataBaseTables -> GenerateTableFecth(partition size= 50k records) -> Execute SQL(Avro) -> PutDatabaseRecord.
Created 03-21-2018 08:42 AM
If possible, I recommend MySQL 'LOAD DATA' command instead of putSQL and PutDatabaseRecord.
NiFi save csv --> MySQL local CSV --> MySQL LOAD DATA ...
MySQL LOAD DATA is dramatically fast !
Created 03-21-2018 09:36 AM
Could you please elaborate your answer(NiFi save csv --> MySQL local CSV). MySQL local CSV, I didn't get this one.
Created 03-21-2018 09:55 AM
I mean ...
ListDataBaseTables -> GenerateTableFecth -> Execute SQL -> ConvertRecord(Avro to CSV) -> CSV to MySQL server local file -> ExcecuteProcess(MySQL LOAD DATA... command)
Created 03-21-2018 10:10 AM
MySQL Server and Nifi Server are not running on the same machine. Did you mean copying the flow files into local server and from there LOAD DATA?
Created 03-21-2018 10:30 AM
You can use ExcecuteProcess for SCP or GetSFTP...
Created 03-21-2018 11:18 AM
If I understood correctly, it's like writing the file to disk and using that file as part of load data infile command? Did my understanding correct?
Created 03-21-2018 11:51 AM
Yes, You did ! ^^
Created 03-21-2018 12:05 PM
Writing flowfiles to disk will be costly operation as the size of the flow files is huge and there are lots of such flow files. Any alternative solution apart from writing contents of flowfiles to disk?
Created 03-21-2018 12:47 PM
Writing flowfiles to disk will be costly operation as the size of the flow files is huge and there are lots of such flow files. Any alternative solution apart from writing contents of flowfiles to disk?
I think you should select the better solution(as I know, there is no best) because both of them have the trade-off.
Your questioned solution has no file operation and no writing contents of flowfiles to disk.
But, very slo~~~w !
My solution has costly file operation and writing contents of flowfiles to disk.
But, Fast !