- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Using SSL for Sqoop Import
- Labels:
-
Apache Sqoop
Created 06-10-2019 07:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Am trying to import data from an exadata instance. as the data contains sensitive information, i need to secure in-flight data transfer.
is there any way, by which i can use SSL with sqoop? or what could be other solution?
Created 06-10-2019 08:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes you can use SSL/TLS with sqoop but you have to do a couple of configurations and configure the keystore see Sqoop 2 shell support for TLS/SSL
With Sqoop 1.4.5 you can use Hadoop credential provider API The CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets see below example
# Encrypting SQOOP password
Generating the jceks file, the password should be the database
$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/sugata/mysql.testDB.password.jceks Enter password: Enter password again: mysql.testDB.alias has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Validating the creation
$ hdfs dfs -ls /user/sugata Found 1 items -rwx------ 3 sheltong hdfs 503 2019-06-09 01:40 /user/sugata/mysql.testDB.password.jceks
Running the sqoop with the jceks alias
$ sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/sugata/mysql.testDB.password.jceks --driver com.mysql.jdbc.Driver --connect jdbc:mysql://hadoop.node1.com:3306/test --username sugata --password-alias mysql.testDB.alias --table "customer" --target-dir /user/sugata/test
Success output
Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/09/02 02:08:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.2.0-205 18/09/02 02:08:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 18/09/02 02:08:06 INFO manager.SqlManager: Using default fetchSize of 1000 18/09/02 02:08:06 INFO tool.CodeGenTool: Beginning code generation 18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0 18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0 18/09/02 02:08:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.2.0-205/hadoop-mapreduce Note: /tmp/sqoop-sheltong/compile/32c3e11ab1e1878e6ca7638a96feb30b/customer.java uses or overrides a deprecated API. Physical memory (bytes) snapshot=669270016 Virtual memory (bytes) snapshot=18275794944 Total committed heap usage (bytes)=331350016 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=243892 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Transferred 238.1758 KB in 218.8164 seconds (1.0885 KB/sec) 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Retrieved 2170 records.
Sqoop import in hdfs
$ hdfs dfs -ls /user/sugata/test Found 5 items -rw-r--r-- 3 sugata hdfs 0 2019-06-09 02:11 /user/sugata/test/_SUCCESS -rw-r--r-- 3 sugata hdfs 60298 2019-06-09 02:10 /user/sugata/test/part-m-00000 -rw-r--r-- 3 sugata hdfs 60894 2019-06-09 02:10 /user/sugata/test/part-m-00001 -rw-r--r-- 3 sugata hdfs 62050 2019-06-09 02:11 /user/sugata/test/part-m-00002 -rw-r--r-- 3 sugata hdfs 60650 2019-06-09 02:11 /user/sugata/test/part-m-00003
So you have the solutions
Created 06-10-2019 10:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Geoffrey, thanks for your thoughts.
i have implemented the jceks for encrypting passwords while sending the import request.
however, i want the rsponse, i.e the data to come through a secure channel from source to hadoop. like in http(s) we implement certificate based authentication, is anything similar there for sqoop?
note : am not having sqoop2
Created 06-10-2019 11:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If already have an SSL cert file, then you can generate you own JKS file and import your cert into your jks.
Run the following commands?
Assumption
Alias = sugatajks
SSL Cert= exadata.crt
keytool -genkey -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd}
keytool -delete -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd}
keytool -import -alias sugatajks -file /etc/pki/CA/certs/exadata.crt -keypass {hidden_passwd} -keystore exadata.jks -storepass {hidden_passwd}Now run the import command
$ sqoop import --connect --connect "jdbc:mysql://hadoop.node1.com:3306/test
username={hidden_username};password={hidden_passwd};encrypt=true;trustServerCertificate=false;trustStore=/exadata.jks" + {options}e.g --table customer --fields-terminated-by , --escaped-by \\ --enclosed-by '"' --compress -m 1 --target-dir /user/sugata/ --append --hive-drop-import-delims -- --schema exadat --table-hints NOLOCK
Hope that gives you the idea.
Created 06-10-2019 01:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tried that..but getting below error, when am trying to put more parameters in connect option :
ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:553)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:254)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:326)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: oracle.net.ns.NetException: Invalid connection string format, a valid format is: "host:port:sid"
at oracle.net.resolver.AddrResolution.resolveSimple(AddrResolution.java:498)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:437)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:693)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:251)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1140)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:340)
Created 06-10-2019 02:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the error you posted it looks like you are using Oracle DB and Oracle JDBC Driver instead of MySQL.
ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid" java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid" at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
.
So you might want to try something like following. Please make sure that the Oracle JDBC URL is in correct format as following:
Example URL
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=serverne)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=servicename)))
Example Sqoop command
# sqoop import -Dmapreduce.map.java.opts="-Djavax.net.debug=all -Djavax.net.ssl.keyStore='/PATH/TO/YOUR_TRUST_STORE.jks' -Djavax.net.ssl.keyStorePassword=ZZZZZZZZ -Djavax.net.ssl.keyStoreType=JKS" --connect "jdbc:oracle:thin:@ORACLE_DB_HOST:1521:TESTDB" --username "AAAAAAAAAA" --password "BBBBBBBBB" --table "customer" --verbose
.
In order to get more information regarding "Oracle JDBC Connectivity Over SSL using Thin Driver" Please refer to the following link: https://www.oracle.com/technetwork/topics/wp-oracle-jdbc-thin-ssl-130128.pdf