Support Questions

cardozogp · ‎12-24-2021

Hello everybody,

Recently my DBA team called me informing that Sqoop was sending queries to the database without parameter binding.

Looking in the Sqoop documentation I got confirmation of the use of Prepared Statement by Sqoop only in the export command, with --batch parameter. With import command there is not this option to use.

Could you help me identify how I could (if possible) use Sqoop's parameters to make it use binding variables in import querys?

Any thoughts will be appreciated.

Thanks!

PS: Above is the description of the --batch parameter.

--batch Use batch mode for underlying statement execution.

mszurap · ‎01-04-2022

Hi @cardozogp ,

With sqoop import (DB -> HDFS) sqoop will submit just a couple of (or in range of max 100s) of queries to the database:

- 1 query to get the table schema

- a couple of queries to determine the split ranges (see "split-by")

- and every mapper starts 1 select statement with their own split key range. The mappers from then on just call "getNext" on the result sets to fetch the next batch of results from the DB.

With sqoop export (HDFS->DB) however the mappers are calling separate insert statements for every "batch" (~100 s of records), so there may be millions of insert statements submitted, in that case this indeed has an impact.

As there are really just a few queries in play with sqoop import, the usage of prepared statements does not have a significant benefit. Let us know if your DBA team has some other insights which we may not be aware of.

Best regards

Miklos

View solution in original post

mszurap · ‎01-04-2022