Created 12-24-2021 06:33 AM
Hello everybody,
Recently my DBA team called me informing that Sqoop was sending queries to the database without parameter binding.
Looking in the Sqoop documentation I got confirmation of the use of Prepared Statement by Sqoop only in the export command, with --batch parameter. With import command there is not this option to use.
Could you help me identify how I could (if possible) use Sqoop's parameters to make it use binding variables in import querys?
Any thoughts will be appreciated.
Thanks!
PS: Above is the description of the --batch parameter.
--batch Use batch mode for underlying statement execution.
Created 01-04-2022 08:43 AM
Hi @cardozogp ,
With sqoop import (DB -> HDFS) sqoop will submit just a couple of (or in range of max 100s) of queries to the database:
- 1 query to get the table schema
- a couple of queries to determine the split ranges (see "split-by")
- and every mapper starts 1 select statement with their own split key range. The mappers from then on just call "getNext" on the result sets to fetch the next batch of results from the DB.
With sqoop export (HDFS->DB) however the mappers are calling separate insert statements for every "batch" (~100 s of records), so there may be millions of insert statements submitted, in that case this indeed has an impact.
As there are really just a few queries in play with sqoop import, the usage of prepared statements does not have a significant benefit. Let us know if your DBA team has some other insights which we may not be aware of.
Best regards
Miklos
Created 01-04-2022 08:43 AM
Hi @cardozogp ,
With sqoop import (DB -> HDFS) sqoop will submit just a couple of (or in range of max 100s) of queries to the database:
- 1 query to get the table schema
- a couple of queries to determine the split ranges (see "split-by")
- and every mapper starts 1 select statement with their own split key range. The mappers from then on just call "getNext" on the result sets to fetch the next batch of results from the DB.
With sqoop export (HDFS->DB) however the mappers are calling separate insert statements for every "batch" (~100 s of records), so there may be millions of insert statements submitted, in that case this indeed has an impact.
As there are really just a few queries in play with sqoop import, the usage of prepared statements does not have a significant benefit. Let us know if your DBA team has some other insights which we may not be aware of.
Best regards
Miklos
Created 01-07-2022 09:37 AM
@cardozogp Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
Regards,
Diana Torres,