I would like to extract a big table (MySQL, more than 3 millions rows) and to write it as a file in HDFS.
What would be the best way to do it ?
I tried the following processors :
- ExecuteSQL : error : pb memory
- QueryDatabaseTable : error : pb memory
- GenerateTableFetch : error : failed to invoke @OnScheduled method due to java.lang.RuntimeException
I have 20 Go of memory.
What would be the best way to do it ? Can I set up parameters so that I generate more than 1 DataFlow, then merge in NiFi before loading to HDFS ?
What are your JVM memory settings? The standard is 512MB, which will likely result in OOM with a large query result set. Best to give NIFI as much memory as possible if you plan to do a lot of in memory workload, like working with large result sets in this case.
Ah, 512MB is probably to low for your use case. If you don't have a lot of other services running on your node, I would suggest to start with 80% of the node memory.