Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Bind in Sqoop querys on import command

avatar
Explorer

Hello everybody,

 

Recently my DBA team called me informing that Sqoop was sending queries to the database without parameter binding.

 

Looking in the Sqoop documentation I got confirmation of the use of Prepared Statement by Sqoop only in the export command, with --batch parameter. With import command there is not this option to use.

 

Could you help me identify how I could (if possible) use Sqoop's parameters to make it use binding variables in import querys?

 

Any thoughts will be appreciated.

Thanks!

 

PS: Above is the description of the --batch parameter.

--batch Use batch mode for underlying statement execution.

1 ACCEPTED SOLUTION

avatar

Hi @cardozogp ,

With sqoop import (DB -> HDFS) sqoop will submit just a couple of (or in range of max 100s) of queries to the database:

- 1 query to get the table schema

- a couple of queries to determine the split ranges (see "split-by")

- and every mapper starts 1 select statement with their own split key range. The mappers from then on just call "getNext" on the result sets to fetch the next batch of results from the DB.

 

With sqoop export (HDFS->DB) however the mappers are calling separate insert statements for every "batch" (~100 s of records), so there may be millions of insert statements submitted, in that case this indeed has an impact.

 

As there are really just a few queries in play with sqoop import, the usage of prepared statements does not have a significant benefit. Let us know if your DBA team has some other insights which we may not be aware of.

 

Best regards

 Miklos

View solution in original post

2 REPLIES 2

avatar

Hi @cardozogp ,

With sqoop import (DB -> HDFS) sqoop will submit just a couple of (or in range of max 100s) of queries to the database:

- 1 query to get the table schema

- a couple of queries to determine the split ranges (see "split-by")

- and every mapper starts 1 select statement with their own split key range. The mappers from then on just call "getNext" on the result sets to fetch the next batch of results from the DB.

 

With sqoop export (HDFS->DB) however the mappers are calling separate insert statements for every "batch" (~100 s of records), so there may be millions of insert statements submitted, in that case this indeed has an impact.

 

As there are really just a few queries in play with sqoop import, the usage of prepared statements does not have a significant benefit. Let us know if your DBA team has some other insights which we may not be aware of.

 

Best regards

 Miklos

avatar
Community Manager

@cardozogp Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks


Regards,

Diana Torres,
Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: