Support Questions

MGarg · ‎11-20-2017

Hi Everyone,

I have a requirement to do full table loads for ~60 tables from an Oracle Database and I have a shell script that runs sqoop on each of those tables. But it takes a long time to load all those tables because some of them are huge, so I started tuning the sqoop job for each of them.

However, I stumbled upon this option "--fetch-size" and I have some questions related to it:

Does anyone know if it changes the "oracle.row.fetch.size" for the JDBC connection?

Is there a maximum limit for this parameter?

Does it impact the source DB or the Hadoop side resources?

Are there any guidelines about finding an optimum value for this parameter?

Thanks & Regards,

Mohit Garg

DennisJaheruddi · ‎04-10-2019

Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.)

If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.

- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

Cloudera Community

Support Questions

What is a reasonable value for "--fetch-size" for Sqoop'ing Data from Oracle?

reading data from oracle in parallel

Connect Oracle SQL Developer to Hive

Insert values in Array data type - Hive

Sqoop imports from oracle, informix and mysql:

Apache NiFi - JOLT Expression: Flatten Hierarchic...

Indexing Oracle tables into Apache Solr

JupyterLab and Spark Connect Quickstart in Clouder...

What is the recommended value of file descriptors ...

Putting SequenceFile key value into a data frame

ERROR: java.sql.SQLException: ORA-09817: Write to ...