Created 03-02-2016 09:19 AM
I need to use sqoop on linux to pull data from SQL Server running with integrated security. Can anyone confirm that they have made this work with HDP 2.3.4 and share the steps?
Created 02-10-2017 07:38 AM
@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.
Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).
So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.
Sample Command I tried on the server as follows,
sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \
--verbose --target-dir /tmp/33 -m 1 -- --schema dbo
Created 09-02-2016 01:47 PM
Integrated authentication does not work with SQLServer even in a secure cluster with AD integration as the containers will not have the context (even in secure windows clusters, the impersonation level does not support this)
You can use the password alias (which makes use of the Hadoop credentials support) to use SQL server authentication without exposing the passwords.
Created 09-12-2016 01:28 PM
I am interested too, we have been advised to use either Progress DataDirect driver or the JTDC Driver
Created 02-09-2017 06:19 PM
I spent months on this and was told by Cloudera that it is just not possible. The kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem).
I think this is something that would have to be added to Sqoop itself, much like HBase mapreduce jobs have to pass job config to mappers in the code Sqoop able to do the same.
I spent weeks trying different things, doing network traces (all my servers are tied to AD with Centrify, ext), watched the token drop, and finally found a paper written by Yahoo about YARN that explains the internal token subsystem it uses (after using Kerberos to externally verify someone it moves to different token based subsystem for performance).
My company would be willing to pay some $$ to get this working. All our data is being held hostage in SQL Server (from a company we are merging with).
Created 02-10-2017 07:38 AM
@Justin MillerI too faced the same problem, finally i resolved the issue with the below mention procedure,it may help you i hope.
Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).
So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.
Sample Command I tried on the server as follows,
sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \
--verbose --target-dir /tmp/33 -m 1 -- --schema dbo
Created 03-13-2017 07:04 PM
Greetings Rajendra,
You have saved my life, your solution works!!! We have spent two years on this thinking there was no way to get Sqoop working, I want to buy you a drink!!
Created 03-14-2017 05:08 AM
@Justin MillerThank you..!!!
This Hortonwork community blog is for mutual knowledge exchange for solving problems, should appreciate all for their time in answering our question.
Created 06-01-2017 07:54 PM
It works, Thank you!!
Created 03-14-2017 03:54 AM
I have found a solution to this provided by another user here: https://community.hortonworks.com/questions/20719/sqoop-to-sql-server-with-integrated-security.html
Basically if you switch to the jtds driver which you can download here: http://jtds.sourceforge.net/
"
Sqoop SQL Server data import to HDFS worked with manual parametric the authentication(using windows credential) with added parameter on the SQL Server JDBC driver, as integrated security is not supported by the SQL driver as of now due to the Kerberos authentication(Delegated tokens distributed over cluster while running MR job).
So we need to pass the windows authentication with password and with the integrated security disabled mode to import the data to the system. As normal SQL server driver does not support, so I had used the jtds.jar and the different driver class to pull the data to the Hadoop Lake.
Sample Command I tried on the server as follows,
sqoop import --table Table1 --connect "jdbc:jtds:sqlserver://<Hostname>:<Port>;useNTLMv2=true;domain=<WindowsDomainName>;databaseName=XXXXXXXXXXXXX" \
--connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX --password 'XXXXXXX' \
--verbose --target-dir /tmp/33 -m 1 -- --schema dbo
"
Here are some examples that worked for me:
# List databases
sqoop list-databases --connect "jdbc:jtds:myactivedirectorydomain.com" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P
# List tables
sqoop list-tables --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username jmiller.admin -P
# Pull data example
sqoop import --table TABLENAMEHERE --connect "jdbc:jtds:myactivedirectorydomain.com;databaseName=DATABASENAMEHERE" --connection-manager org.apache.sqoop.manager.SQLServerManager --driver net.sourceforge.jtds.jdbc.Driver --username XXXXX -P --fields-terminated-by '\001' --target-dir /user/XXXXX/20170313 -m 1 -- --schema dbo
Note* In the above example you need to change the username to your username and database name in the list-tables or pull to the one you need (note the AD account you use will require access to the data).
Created 05-13-2020 11:56 PM
Yes this is possible.
You need to kinit with the username that has been granted access to the SQL server DB and tables. integrated security passes your credentials to the SQL server using kerberos
"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"
This worked for me.
Created 05-29-2021 02:37 PM
@Onedile wrote:Yes this is possible.
You need to kinit with the username that has been granted access to the SQL server DB and tables. integrated security passes your credentials to the SQL server using kerberos
"jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;"
This worked for me.
It doesn't work, it's still facing issue with the latest MSSQL JDBC driver as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem)
21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1616335290043_2743822
21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for c795701: HDFS_DELEGATION_TOKEN owner=c795701@XX.XXXX.XXXXXXX.COM, renewer=yarn, realUser=, issueDate=1622314832608, maxDate=1622919632608, sequenceNumber=29194128, masterKeyId=1856)]
21/05/29 19:01:15 INFO mapreduce.Job: Task Id : attempt_1616335290043_2743822_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:53879236-81e7-4fc6-88b9-c7118c02e7be
Caused by: java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
Use the jtds driver as recommended here