11-20-2017 09:34 AM
I have a requirement to do full table loads for ~60 tables from an Oracle Database and I have a shell script that runs sqoop on each of those tables. But it takes a long time to load all those tables because some of them are huge, so I started tuning the sqoop job for each of them.
However, I stumbled upon this option "--fetch-size" and I have some questions related to it:
Does anyone know if it changes the "oracle.row.fetch.size" for the JDBC connection?
Is there a maximum limit for this parameter?
Does it impact the source DB or the Hadoop side resources?
Are there any guidelines about finding an optimum value for this parameter?
Thanks & Regards,
04-10-2019 02:34 AM
Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.)
If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.