Created 04-21-2017 07:23 PM
Hi,
I would like to extract a big table (MySQL, more than 3 millions rows) and to write it as a file in HDFS.
What would be the best way to do it ?
I tried the following processors :
- ExecuteSQL : error : pb memory
- QueryDatabaseTable : error : pb memory
- GenerateTableFetch : error : failed to invoke @OnScheduled method due to java.lang.RuntimeException
I have 20 Go of memory.
What would be the best way to do it ? Can I set up parameters so that I generate more than 1 DataFlow, then merge in NiFi before loading to HDFS ?
Thank you.
Created 04-21-2017 07:44 PM
What does the configuration of the QueryDatabaseTable processor look like?
Created on 04-21-2017 07:51 PM - edited 08-17-2019 09:26 PM
QueryDatabaseTable is like this :
GenerateTableFetch is like this :
Created 04-21-2017 08:02 PM
You have allocated 20GB to the NiFi JVM correct?
Will you post the memory error please?
Created on 04-23-2017 01:45 PM - edited 08-17-2019 09:26 PM
Correct.
I am testing 3 ways actually, and I want to know what is the best wayt to do it :
1. For ExecuteSQL : error : pb memory
-----------------------------------------------------
2. For QueryDatabaseTable :
I can extract data but 1 record = 1 flow file. and it is very slow.
Is it possible to generate a flow file every 1000 records for example and then merge theses flow files into a single one?
-----------------------------------------------------
3. For GenerateTableFetch : error : failed to invoke @OnScheduled method due to java.lang.RuntimeException
Created 04-23-2017 10:10 PM
For method #2, I would set the Max Rows Per Flow File to 1000 and monitor the performance. You might be able to increase above 1000.