Posts: 10
Registered: ‎06-25-2018
Accepted Solution

Sqoop virtual memory error

[ Edited ]

Hi.  I am having a "What the heck" moment.  Could someone please explain the theory behind this.  I have always presumed that Sqoop - unlike other MR processes that might require the entire dataset to be in memory to work - should not ever have a OOM issue.  Afterall, it is using its memory as a buffer, copying the data from DB to the staging area in HDFS, and when complete, moving from staging to --target-dir.


So, we were moving a fairly large DB (500GB) but our client would only allow us to use 1 mapper (don't ask why...gulp).  About 90 minutes into the process, it terminated with:


Container is running beyond the 'PHYSICAL' memory limit. Current usage: 1.0Gib of 1 GB physical memory used; 2.7GB of 2.1 GB virtual memory used. Killing container


This is really confusing me.  I suppose I can solve the problem by

a) increasing the vmem/pmem ratio (yarn.nodemanager.vmem-pmem-ratio = xyz)


b) not checking for this error (yarn.nodemanager.vmem-check-enabled = false).


But WHY is this error coming up?  


Thanks in advance and cheers.

Posts: 1,892
Kudos: 432
Solutions: 302
Registered: ‎07-31-2013

Re: Sqoop virtual memory error

One possibility could be the fetch size (combined with some unexpectedly
wide rows). Does lowering the result fetch size help?

--fetch-size Number of entries to read from database at once.

Also, do you always see it fail with the YARN memory kill (due to pmem
exhaustion) or do you also observe an actual java.lang.OutOfMemoryError
occasionally? If it is always the former, then another suspect would be
some off-heap memory use done by the JDBC driver in use, although I've not
come across such a problem.
New solutions