Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

*Closed* : How to import mutilple tables (not all tables) in Sqoop

avatar
Contributor

Hi all,

I want to import specific tables from MsSQL to HIVE database. Although I have tried --exclude-tables arguments for excluding some table while importing.

But my scenario is a little bit different.

For Example, If I have 100 tables and I want to import only 98 tables then I could exclude those 2 tables using --exclude-tables argument. But what if I want to import only 2 tables among those 100 tables.

I tried to give multiple tables names in --table argument.

My observation:

import-all-tables
--connect
jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name>
--username
xxxxx
--password
xxxx
--table
mssql_table1,mssql_table2
--hive-import
--hive-database
<hive_database_name>
--fields-terminated-by
","
-m
1

Does anyone have an idea?

Regards,

Jay.

1 ACCEPTED SOLUTION

avatar
Master Guru
@JAy PaTel

One way of doing is using --exclude-tables argument and mentioning all the tables with comma separated values.

(or)

Writing shell script to to have the required two tables and iterating sqoop job on these tables.

Please refer to this and this links for more details.

View solution in original post

12 REPLIES 12

avatar
Master Guru
@JAy PaTel

One way of doing is using --exclude-tables argument and mentioning all the tables with comma separated values.

(or)

Writing shell script to to have the required two tables and iterating sqoop job on these tables.

Please refer to this and this links for more details.

avatar
Contributor

Hey @Shu, I have gone through both links which you have given.

This link this for "how to import data from multiple source thru Sqoop?"

In this link, there is no idea about import only specific tables using Sqoop. They created script for importing tables and aslo mentioning those 100 tables names. i.e., table1, table2,table3,table4,table5,table6,... show on.
But I don't want to mention those all tables names.

Regards,

Jay.

avatar
Master Guru
@JAy PaTel

Let me explain how to do your case using Shell script

1.Using whileloop:

Create a input file with all the required tables i.e

bash$ vi required_tables.txt
mssql_table1
mssql_table2

Then the shell script will read the above file data line by line and executes sqoop import for each table

bash$ vi sqoop_import_while.sh
while read line; do tableName=`echo $line | cut -d'.' -f2` sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx - -password xxxx --table $tableName --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1 done </home/required_table.txt(give your required_tables.txt file path)

Now the script reads each line from required_table.txt file and assigns the value to tableName variable then we are using tableName variable in our sqoop import staetment.

Once the import is finished for the first table then script reads the next line and performs import again for the second table.

2.Using Forloop:

bash$ vi sqoop_import_for.sh
declare -a req_tables=("mssql_table1" "mssql_table2")
for t in "${req_tables[@]}"
do
	sqoop import --connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name> --username xxxxx --password xxxx --table $t --hive-import --hive-database <hive_database_name> --fields-terminated-by ',' -m 1
done

By using this script we are not reading the table names from the file instead we are defining an array variable and iterating through all the array elements and performing sqoop import.

We have defined req_tables array and given all the required tables then performing for loop each table in req_tables array and passing the name to --table argument in sqoop import.

give the permissions to the shell script file then execute the script by using ./sqoop_import_for.sh (or) ./sqoop_import_while.sh

You can choose either of the above ways to import only the required tables using sqoop.

Let us know if you are facing issues ..!!

avatar
Contributor
@Shu

Thank a ton !!!

I would like to suggest to use While Loop.

Regards,

Jay.

avatar
Master Mentor

@JAy PaTel

When you want to import ONLY a subset of the database you don't use "import-all-tables"

 $ sqoop import 
--connect jdbc:sqlserver://<HOST>:<port>;databasename=<mssql_database_name>
--username foo 
--password  foo
--table mssql_table1,mssql_table2 
--hive-import

Please try the above method and revert

avatar
Contributor

Hi, @Geoffrey Shelton Okot, thanks

Yes , I tried --exclude-tables argument to exclude some of the tables. But what if I want to import only 2 tables out of 100 tables from MsSQL to Hive database. Do I need to mention all 98 tables names in --exclude-tables argument?

avatar
Contributor

hi @Geoffrey Shelton Okot

I tried above solution, but it is also throwing below error;

18/06/22 11:53:59 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name '<ms_SQL_tbalename>,<ms_SQl_Tablename>'.

Regards,

Jay.

avatar
Master Mentor

@JAy PaTel

Why go the lengthy tedious way to exclude 98 tables when you can just mention the 2 tables you want imported?

avatar
Contributor

@Geoffrey Shelton Okot, thank you,

But 98 is just a number. Let's think about bigger number and change scenario.

What if MsSQL database has 500 tables and I want to import only 100 tables among them? Do I need to mention those all 100 tables in --table argument?

Regards,
Jay.