Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop to SQL Server with Integrated Security

avatar

I need to use sqoop on linux to pull data from SQL Server running with integrated security. Can anyone confirm that they have made this work with HDP 2.3.4 and share the steps?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

View solution in original post

19 REPLIES 19

avatar
Expert Contributor

Integrated authentication does not work with SQLServer even in a secure cluster with AD integration as the containers will not have the context (even in secure windows clusters, the impersonation level does not support this)

You can use the password alias (which makes use of the Hadoop credentials support) to use SQL server authentication without exposing the passwords.

avatar
New Contributor

I am interested too, we have been advised to use either Progress DataDirect driver or the JTDC Driver

avatar
Contributor

I spent months on this and was told by Cloudera that it is just not possible. The kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).

I think this is something that would have to be added to Sqoop itself, much like HBase mapreduce jobs have to pass job config to mappers in the code Sqoop able to do the same.

I spent weeks trying different things, doing network traces (all my servers are tied to AD with Centrify, ext), watched the token drop, and finally found a paper written by Yahoo about YARN that explains the internal token subsystem it uses (after using Kerberos to externally verify someone it moves to different token based subsystem for performance).

My company would be willing to pay some $$ to get this working. All our data is being held hostage in SQL Server (from a company we are merging with).

avatar
Expert Contributor

@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

avatar
Contributor

Greetings Rajendra,

You have saved my life, your solution works!!! We have spent two years on this thinking there was no way to get Sqoop working, I want to buy you a drink!!

avatar
Expert Contributor

@Justin MillerThank you..!!!

This Hortonwork community blog is for mutual knowledge exchange for solving problems, should appreciate all for their time in answering our question.

avatar
Explorer

It works, Thank you!!

avatar
Contributor

I have found a solution to this provided by another user here: https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html

Basically if you switch to the jtds driver which you can download here: http://jtds.sourceforge.net/

Per Rajendra Manjunath

"

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

"

Here are some examples that worked for me:

# List databases

sqoop list-databases --connect "jdbc:jtds:myactivedirectorydomain.com" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P

# List tables

sqoop list-tables --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username jmiller.admin -P

# Pull data example

sqoop import --table TABLENAMEHERE --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P --fields-terminated-by '\001' --target-dir /user/XXXXX/20170313 -m 1 -- --schema dbo

Note* In the above example you need to change the username to your username and database name in the list-tables or pull to the one you need (note the AD account you use will require access to the data).

avatar
Explorer

Yes this is possible. 

 

You need to kinit with the username that has been granted access to the SQL server DB and tables.  integrated security passes your credentials to the SQL server using kerberos

 

"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"

 

This worked for me. 

avatar
New Contributor

@Onedile wrote:

Yes this is possible. 

 

You need to kinit with the username that has been granted access to the SQL server DB and tables.  integrated security passes your credentials to the SQL server using kerberos

 

"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"

 

This worked for me. 


It doesn't work, it's still facing issue with the latest MSSQL JDBC driver as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem)

21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1616335290043_2743822
21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for c795701: HDFS_DELEGATION_TOKEN owner=c795701@XX.XXXX.XXXXXXX.COM, renewer=yarn, realUser=, issueDate=1622314832608, maxDate=1622919632608, sequenceNumber=29194128, masterKeyId=1856)]

21/05/29 19:01:15 INFO mapreduce.Job: Task Id : attempt_1616335290043_2743822_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:53879236-81e7-4fc6-88b9-c7118c02e7be

Caused by: java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

 Use the jtds driver as recommended here