Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop import all tables

avatar
Expert Contributor

Hello,

I would like to run a command like:

sqoop import-all-tables --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://localhost/testdb --username username --password password -- --schema the_schema --hive-database TestDB --hive-import

However, the "-- --schema the_schema" is not being picked up. Instead, all the tables are assuming that "testdb" is the schema instead of "the_schema".

Is there some way to correct this? we would like to the import-all-tables to work as we have many tables that maybe changing throughout the life of the database and hadoop.

Thanks,

Marcia

1 ACCEPTED SOLUTION

avatar
Rising Star

-D mapreduce.map.memory.mb=2048-D mapreduce.map.java.opts=-Xmx1024m . Change the memory acccordingly based on your cluster and try it.

View solution in original post

24 REPLIES 24

avatar
Rising Star

few lines prior to this?

avatar
Rising Star

Can you check what it he contents of PROJECT_ID, looks like the issue is on that. What type of field is that

avatar
Expert Contributor

project_id is INTEGER

avatar
Rising Star

That is something related to DB2 you need to resolve.

-206 object-name IS NOT VALID IN THE CONTEXT WHERE IT IS USED

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/codes/src/tpc/n206.html

To isolate the problem just get that single table and see how it goes.

avatar
Expert Contributor

I think the problem is regarding importalltables and schema...

I wish to import all the tables within one schema... is this possible using import-all-tables? Or must I go table by table??

avatar
Rising Star

As I mentioned --schema is not supported by import-all-tables. Looks like it takes the db name as the schema name by default. or if you have write access to db2, create a schema with same name and create aliases/synonyms or all tables in that schema. should work.

avatar
Expert Contributor

Hello, this sounds like an excellent idea!

Please let me know how I should do this...

avatar
Expert Contributor

So my new schema should be called: TESTDB .

And all the tables in SCHEMA should have aliases within the new schema, TESTDB?

Also, which do I use? Alias / Synonym?

Does the username, SCHEMA, do anything with the settings as well?

I'm thinking of doing the following in DB2:

  1. create schema TESTDB:
  2. create alias TESTDB.RC_SUM for originalSchema.RC_SUM;

avatar
Contributor

I have found a solution to this provided by another user here: https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html

Basically if you switch to the jtds driver which you can download here: http://jtds.sourceforge.net/

Per Rajendra Manjunath

"

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

"

Here are some examples that worked for me:

# List databases

sqoop list-databases --connect "jdbc:jtds:myactivedirectorydomain.com" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P

# List tables

sqoop list-tables --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username jmiller.admin -P

# Pull data example

sqoop import --table TABLENAMEHERE --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P --fields-terminated-by '\001' --target-dir /user/XXXXX/20170313 -m 1 -- --schema dbo

Note* In the above example you need to change the username to your username and database name in the list-tables or pull to the one you need (note the AD account you use will require access to the data).

avatar
Expert Contributor

Thanks, but I'm using DB2.

Also, I can import one table just fine...

I just cannot import all tables...