Created on 10-18-2013 06:06 AM - edited 09-16-2022 01:49 AM
Hi, i have tried importing data from oracle and mysql to hive and HDFS using sqoop. I want to import selected multiple tables and selected multiple columns from those tables.
I did not find anything regarding this. A post in stack overflow says it cannot http://stackoverflow.com/questions/17194232/sqoop-import-multiple-tables
I want to confirm it. Please let me know any pointers in this.
I am not restricted to sqoop open to any technology which suffice my requirement.
My requirement is to import data from multiple tables from data in oracle db into nosql db like hive and run queries in hive and generate reports fast. in oracle some reports are taking 30-40 hours.
Created 10-21-2013 10:17 AM
Hi Bas, this sounds like a good scenario in which to use the import-all-tables tool (http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal) along with the --exclude-tables <tables> parameter, which is a comma separated list of tables to exclude from the import process.
Regards, Kathleen
Created 10-21-2013 10:17 AM
Hi Bas, this sounds like a good scenario in which to use the import-all-tables tool (http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_literal_sqoop_import_all_tables_literal) along with the --exclude-tables <tables> parameter, which is a comma separated list of tables to exclude from the import process.
Regards, Kathleen
Created on 10-26-2013 12:37 AM - edited 10-26-2013 12:39 AM
Hi Kathleen, thanks for reply. While importing a single table with the help of this command below, i can import only the required columns.
sqoop-import --connect jdbc:oracle:thin:@****************:1521/** --username ** --password ** --table REGACCOUNT --columns ACCOUNTNUMBER,ISREGISTERED,COMPANYNAME,DOMAINNAME, -m 1
As you suggested using exclude table command i can import multiple tables, which is very helpful pointer, thank you but proceeding furthur is there any option to select only the required columns from those tables or would you suggest to importing the all required tables into hive and then making changes in our hive query to generate reports (since there are large columns like id in insurance DB which are not useful).