Hi Experts, I am sqooping out a Hive table with 20-30 columns and apprx. 100 millions records to Oracle.
sqoop export -D mapred.job.queue.name=<yarn queue name> -Dsqoop.export.records.per.statement=100000 --connect jdbc:oracle:thin:<db server> --username xxxx --P --table employee --hcatalog-database default --hcatalog-table employee --num-mappers 10 --batchSource Hive table is unpartitioned and has 200 part files, total 1.1 GB of data. And destination oracle table has identity defined on pk column which has cache of 100000.
It's taking approx 5-6 minutes to sqoop out data completely with default fetch size 1000. Any suggestion how can I improve more to get better performance. Is there anyway I can change fetch size?
Thanks in advance.
1. The table in question is it a text-based table to is it of a different format.
2. Have you tried increasing the number of mappers?
Also as you are trying to load the data into oracle, have you tried enabling --direct -> used for Data Connector for Oracle which does a fast import and export
If you have not tried these, I would suggest you to try with these options.
Thanks for your reply...
1. Table is in orc format
2. Yes I tried increasing number of mappers. Too many mappers causing Oracle side locking, too many insert operation on the same block.
I tried using -Dsqoop.export.records.per.statement and -Dsqoop.export.statements.per.transaction with --batch and --direct but seems it doesn't help that much, --fetch-size didn't work with export