Reply
New Contributor
Posts: 6
Registered: ‎10-21-2014
Accepted Solution

Sqoop GC overhead limit exceeded after CDH5.2 update

Hi,

 

we updated sqoop from cdh5.0.1 to cdh5.2 and now it fails everytime with a GC overhead limit exceeded error.

The old version was able to import over 14GB of data over one mapper and the import fails now when a mapper gets too many rows. I checked a heap dump and the memory was completely used by over 3.5 million rows of data (-Xmx 1700M).

The connector is mysql-jdbc version 5.1.33 and the job imports the data as text file in a have table.

 

Can I avoid this with a setting or is this a bug that should go to jira?

 

Thank you,

Jürgen

New Contributor
Posts: 1
Registered: ‎10-22-2014

Re: Sqoop GC overhead limit exceeded after CDH5.2 update

This appears to be a regression caused by the fix in SQOOP-1400. Instead of fetching results from MySQL row-by-row, sqoop is instead attempting to load the entire result set in memory.

 

We worked around it by upgrading to MySQL/J Connector 5.1.33 (which you're already on), and then including "--fetch-size -2147483648" in our sqoop command line options list. This restores the old row-by-row behaviour (the weird fetch size is a sentinel value recognised by the MySQL JDBC driver.)

New Contributor
Posts: 6
Registered: ‎10-21-2014

Re: Sqoop GC overhead limit exceeded after CDH5.2 update

Thanks for the answer!

 

I also found the workaround after some time, but you were faster to post it. I'll open a Jira for it, that it will be fixed in new versions.

Snd
New Contributor
Posts: 1
Registered: ‎07-22-2018

Re: Sqoop GC overhead limit exceeded after CDH5.2 update

Use ?dontTrackOpenResources=true&defaultFetchSize=1000&useCursorFetch=true property in Mysql Connection string. It work without changing JVM parameter.

Highlighted
New Contributor
Posts: 1
Registered: ‎07-22-2018

Re: Sqoop GC overhead limit exceeded after CDH5.2 update

Thanks a lot! It worked forme also
Announcements