Created 06-04-2018 12:40 PM
Hi all,
I want to import specific tables from MsSQL to HIVE database. Although I have tried --exclude-tables arguments for excluding some table while importing.
But my scenario is a little bit different.
For Example, If I have 100 tables and I want to import only 98 tables then I could exclude those 2 tables using --exclude-tables argument. But what if I want to import only 2 tables among those 100 tables.
I tried to give multiple tables names in --table argument.
My observation:
import-all-tables --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx --password xxxx --table mssql_table1,mssql_table2 --hive-import --hive-database <hive_database_name> --fields-terminated-by "," -m 1
Does anyone have an idea?
Regards,
Jay.
Created 06-04-2018 01:04 PM
One way of doing is using --exclude-tables argument and mentioning all the tables with comma separated values.
(or)
Writing shell script to to have the required two tables and iterating sqoop job on these tables.
Created 06-04-2018 01:04 PM
One way of doing is using --exclude-tables argument and mentioning all the tables with comma separated values.
(or)
Writing shell script to to have the required two tables and iterating sqoop job on these tables.
Created 06-04-2018 02:02 PM
Hey @Shu, I have gone through both links which you have given.
This link this for "how to import data from multiple source thru Sqoop?"
In this link, there is no idea about import only specific tables using Sqoop. They created script for importing tables and aslo mentioning those 100 tables names. i.e., table1, table2,table3,table4,table5,table6,... show on.
But I don't want to mention those all tables names.
Regards,
Jay.
Created 06-05-2018 01:22 AM
Let me explain how to do your case using Shell script
1.Using whileloop:
Create a input file with all the required tables i.e
bash$ vi required_tables.txt
mssql_table1
mssql_table2
Then the shell script will read the above file data line by line and executes sqoop import for each table
bash$ vi sqoop_import_while.sh
while read line; do tableName=`echo $line | cut -d'.' -f2` sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx - -password xxxx --table $tableName --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1 done </home/required_table.txt(give your required_tables.txt file path)
Now the script reads each line from required_table.txt file and assigns the value to tableName variable then we are using tableName variable in our sqoop import staetment.
Once the import is finished for the first table then script reads the next line and performs import again for the second table.
2.Using Forloop:
bash$ vi sqoop_import_for.sh declare -a req_tables=("mssql_table1" "mssql_table2") for t in "${req_tables[@]}" do sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx --password xxxx --table $t --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1 done
By using this script we are not reading the table names from the file instead we are defining an array variable and iterating through all the array elements and performing sqoop import.
We have defined req_tables array and given all the required tables then performing for loop each table in req_tables array and passing the name to --table argument in sqoop import.
give the permissions to the shell script file then execute the script by using ./sqoop_import_for.sh (or) ./sqoop_import_while.sh
You can choose either of the above ways to import only the required tables using sqoop.
Let us know if you are facing issues ..!!
Created 07-06-2018 07:59 AM
Created 06-04-2018 01:11 PM
When you want to import ONLY a subset of the database you don't use "import-all-tables"
$ sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username foo --password foo --table mssql_table1,mssql_table2 --hive-import
Please try the above method and revert
Created 06-04-2018 01:44 PM
Hi, @Geoffrey Shelton Okot, thanks
Yes , I tried --exclude-tables argument to exclude some of the tables. But what if I want to import only 2 tables out of 100 tables from MsSQL to Hive database. Do I need to mention all 98 tables names in --exclude-tables argument?
Created 06-22-2018 11:56 AM
I tried above solution, but it is also throwing below error;
18/06/22 11:53:59 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '<ms_SQL_tbalename>,<ms_SQl_Tablename>'.
Regards,
Jay.
Created 06-04-2018 01:48 PM
Why go the lengthy tedious way to exclude 98 tables when you can just mention the 2 tables you want imported?
Created 06-04-2018 01:55 PM
@Geoffrey Shelton Okot, thank you,
But 98 is just a number. Let's think about bigger number and change scenario.
What if MsSQL database has 500 tables and I want to import only 100 tables among them? Do I need to mention those all 100 tables in --table argument?
Regards,
Jay.