- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What is a reasonable value for "--fetch-size" for Sqoop'ing Data from Oracle?
- Labels:
-
Apache Sqoop
Created on ‎11-20-2017 09:34 AM - edited ‎09-16-2022 05:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone,
I have a requirement to do full table loads for ~60 tables from an Oracle Database and I have a shell script that runs sqoop on each of those tables. But it takes a long time to load all those tables because some of them are huge, so I started tuning the sqoop job for each of them.
However, I stumbled upon this option "--fetch-size" and I have some questions related to it:
Does anyone know if it changes the "oracle.row.fetch.size" for the JDBC connection?
Is there a maximum limit for this parameter?
Does it impact the source DB or the Hadoop side resources?
Are there any guidelines about finding an optimum value for this parameter?
Thanks & Regards,
Mohit Garg
Created ‎04-10-2019 02:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.)
If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.
