Hi, I need to sqoop about 700 tables from 2 Oracle instances and I am using a custom query to extract them.
To accelerate a bit more the process, I set
I have a file with a table on every line, plus some arguments and the query. I built a shell script that uses GNU Parallel to run more than one offload at the same time. It works correctly, however I don't understand why I need to tune the Heapsize of the processes, otherwise it fails with OOM.
I understand that Sqoop uses the HDFS client to write data to HDFS, and since I force Sqoop to fetch 2mil records per time, I need to tune the process to have room for them all.