Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Sqoop to SQL Server with Integrated Security

I need to use sqoop on linux to pull data from SQL Server running with integrated security. Can anyone confirm that they have made this work with HDP 2.3.4 and share the steps?

1 ACCEPTED SOLUTION

Expert Contributor

@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

View solution in original post

19 REPLIES 19

@awhitter

See this https://alexeikh.wordpress.com/2012/05/03/using-sqoop-for-moving-data-between-hadoop-and-sql-server/

You can run sqoop in linux and read sql server to get data into Hadoop

Aslo,

https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html

For example, to connect to a SQLServer database, first download the driver from microsoft.com and install it in your Sqoop lib path.

Then run Sqoop. For example:

$ sqoop import --driver com.microsoft.jdbc.sqlserver.SQLServerDriver \
    --connect <connect-string> ...

When connecting to a database using JDBC, you can optionally specify extra JDBC parameters via a property file using the option --connection-param-file. The contents of this file are parsed as standard Java properties and passed into the driver while creating a connection.

[Note>Note
The parameters specified via the optional property file are only applicable to JDBC connections. Any fastpath connectors that use connections other than JDBC will ignore these parameters.

Thanks Neeraj - how does this help with integrated security?

New Contributor

It does not address the integrated authentication problem. Has anyone tried this one instead?

https://datamountaineer.com/2016/01/15/spark-jdbc-sql-server-kerberos/

Mentor

@awhitter as far as I know this is not possible. You will have to use SQL Authentication. If security is a concern, you can store the password in a encrypted file and call the file in the Sqoop -import command. If you are unable to change the authentication settings on SQL Server due to corporate policies, you may need to dump the tables as files first and import them directly into Hive.

I am also waiting the same problem with jbdc42.jar and below is the error stack -

16/03/17 15:51:39 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication.
java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication.
  at org.apache.sqoop.manager.CatalogQueryManager.listTables(CatalogQueryManager.java:118)
  at org.apache.sqoop.tool.ListTablesTool.run(ListTablesTool.java:49)
  at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
  at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
  at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
  at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: This driver is not configured for integrated authentication.
  at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:2338)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:1929)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:41)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:1917)
  at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1061)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:833)
  at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:716)
  at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:841)
  at java.sql.DriverManager.getConnection(DriverManager.java:571)
  at java.sql.DriverManager.getConnection(DriverManager.java:233)
  at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:883)
  at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
  at org.apache.sqoop.manager.CatalogQueryManager.listTables(CatalogQueryManager.java:102)
  ... 7 more

Can some one please help ?

Sqoop currently does not support SSO. I got an individual use created and now it works. Thanks

I am running into the same problem. Tried by passing the driver --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" along with the sqoop command and get this error.

Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:

Can anyone help, if this is resolved.

I am running into the same problem. Tried by passing the driver --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" along with the sqoop command and get this error.

Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:

Can anyone help, if this is resolved.

Rising Star

Integrated authentication does not work with SQLServer even in a secure cluster with AD integration as the containers will not have the context (even in secure windows clusters, the impersonation level does not support this)

You can use the password alias (which makes use of the Hadoop credentials support) to use SQL server authentication without exposing the passwords.

New Contributor

I am interested too, we have been advised to use either Progress DataDirect driver or the JTDC Driver

Explorer

I spent months on this and was told by Cloudera that it is just not possible. The kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).

I think this is something that would have to be added to Sqoop itself, much like HBase mapreduce jobs have to pass job config to mappers in the code Sqoop able to do the same.

I spent weeks trying different things, doing network traces (all my servers are tied to AD with Centrify, ext), watched the token drop, and finally found a paper written by Yahoo about YARN that explains the internal token subsystem it uses (after using Kerberos to externally verify someone it moves to different token based subsystem for performance).

My company would be willing to pay some $$ to get this working. All our data is being held hostage in SQL Server (from a company we are merging with).

Expert Contributor

@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

Explorer

Greetings Rajendra,

You have saved my life, your solution works!!! We have spent two years on this thinking there was no way to get Sqoop working, I want to buy you a drink!!

Expert Contributor

@Justin MillerThank you..!!!

This Hortonwork community blog is for mutual knowledge exchange for solving problems, should appreciate all for their time in answering our question.

Explorer

It works, Thank you!!

Explorer

I have found a solution to this provided by another user here: https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html

Basically if you switch to the jtds driver which you can download here: http://jtds.sourceforge.net/

Per Rajendra Manjunath

"

Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).

So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.

Sample Command I tried on the server as follows,

sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \

--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \

--verbose --target-dir /tmp/33 -m 1 -- --schema dbo

"

Here are some examples that worked for me:

# List databases

sqoop list-databases --connect "jdbc:jtds:myactivedirectorydomain.com" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P

# List tables

sqoop list-tables --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username jmiller.admin -P

# Pull data example

sqoop import --table TABLENAMEHERE --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P --fields-terminated-by '\001' --target-dir /user/XXXXX/20170313 -m 1 -- --schema dbo

Note* In the above example you need to change the username to your username and database name in the list-tables or pull to the one you need (note the AD account you use will require access to the data).

Explorer

Yes this is possible. 

 

You need to kinit with the username that has been granted access to the SQL server DB and tables.  integrated security passes your credentials to the SQL server using kerberos

 

"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"

 

This worked for me. 

New Contributor

@Onedile wrote:

Yes this is possible. 

 

You need to kinit with the username that has been granted access to the SQL server DB and tables.  integrated security passes your credentials to the SQL server using kerberos

 

"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"

 

This worked for me. 


It doesn't work, it's still facing issue with the latest MSSQL JDBC driver as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem)

21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1616335290043_2743822
21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for c795701: HDFS_DELEGATION_TOKEN owner=c795701@XX.XXXX.XXXXXXX.COM, renewer=yarn, realUser=, issueDate=1622314832608, maxDate=1622919632608, sequenceNumber=29194128, masterKeyId=1856)]

21/05/29 19:01:15 INFO mapreduce.Job: Task Id : attempt_1616335290043_2743822_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:53879236-81e7-4fc6-88b9-c7118c02e7be

Caused by: java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

 Use the jtds driver as recommended here