Support Questions
Find answers, ask questions, and share your expertise

[Sqoop1] OOM on offloading from Oracle


[Sqoop1] OOM on offloading from Oracle


Hi, I need to sqoop about 700 tables from 2 Oracle instances and I am using a custom query to extract them.

To accelerate a bit more the process, I set 

--fetch-size 2000000

on Sqoop.


I have a file with a table on every line, plus some arguments and the query. I built a shell script that uses GNU Parallel to run more than one offload at the same time. It works correctly, however I don't understand why I need to tune the Heapsize of the processes, otherwise it fails with OOM.


I understand that Sqoop uses the HDFS client to write data to HDFS, and since I force Sqoop to fetch 2mil records per time, I need to tune the process to have room for them all.

So I tune HDFS client via


 inside the script, and the Sqoop process's heapsize via and

in the sqoop import command.


My point is: why some tables complete and other don't ? Why it just can slow down to keep pace?

I don't like this approach because as soon as a table grows larger, Sqoop will fail. I can't go to production with something that I know in advance will break in the future.


Don't have an account?