Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is a reasonable value for "--fetch-size" for Sqoop'ing Data from Oracle?

avatar
Rising Star

Hi Everyone,

 

I have a requirement to do full table loads for ~60 tables from an Oracle Database and I have a shell script that runs sqoop on each of those tables. But it takes a long time to load all those tables because some of them are huge, so I started tuning the sqoop job for each of them. 

 

However, I stumbled upon this option "--fetch-size" and I have some questions related to it:

 

Does anyone know if it changes the "oracle.row.fetch.size" for the JDBC connection?

Is there a maximum limit for this parameter?

Does it impact the source DB or the Hadoop side resources?

Are there any guidelines about finding an optimum value for this parameter?

 

Thanks & Regards,

Mohit Garg

1 REPLY 1

avatar

Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.)

 

If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.