10-21-2014 05:45 AM
we updated sqoop from cdh5.0.1 to cdh5.2 and now it fails everytime with a GC overhead limit exceeded error.
The old version was able to import over 14GB of data over one mapper and the import fails now when a mapper gets too many rows. I checked a heap dump and the memory was completely used by over 3.5 million rows of data (-Xmx 1700M).
The connector is mysql-jdbc version 5.1.33 and the job imports the data as text file in a have table.
Can I avoid this with a setting or is this a bug that should go to jira?
10-22-2014 08:26 PM
This appears to be a regression caused by the fix in SQOOP-1400. Instead of fetching results from MySQL row-by-row, sqoop is instead attempting to load the entire result set in memory.
We worked around it by upgrading to MySQL/J Connector 5.1.33 (which you're already on), and then including "--fetch-size -2147483648" in our sqoop command line options list. This restores the old row-by-row behaviour (the weird fetch size is a sentinel value recognised by the MySQL JDBC driver.)
10-23-2014 12:33 AM
Thanks for the answer!
I also found the workaround after some time, but you were faster to post it. I'll open a Jira for it, that it will be fixed in new versions.
07-22-2018 09:59 AM
Use ?dontTrackOpenResources=true&defaultFetchSize=1000&useCursorFetch=true property in Mysql Connection string. It work without changing JVM parameter.