Created 06-10-2019 07:38 AM
Am trying to import data from an exadata instance. as the data contains sensitive information, i need to secure in-flight data transfer.
is there any way, by which i can use SSL with sqoop? or what could be other solution?
Created 06-10-2019 08:17 AM
Yes you can use SSL/TLS with sqoop but you have to do a couple of configurations and configure the keystore see Sqoop 2 shell support for TLS/SSL
With Sqoop 1.4.5 you can use Hadoop credential provider API The CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets see below example
# Encrypting SQOOP password
Generating the jceks file, the password should be the database
$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/sugata/mysql.testDB.password.jceks Enter password: Enter password again: mysql.testDB.alias has been successfully created. org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
Validating the creation
$ hdfs dfs -ls /user/sugata Found 1 items -rwx------ 3 sheltong hdfs 503 2019-06-09 01:40 /user/sugata/mysql.testDB.password.jceks
Running the sqoop with the jceks alias
$ sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/sugata/mysql.testDB.password.jceks --driver com.mysql.jdbc.Driver --connect jdbc:mysql://hadoop.node1.com:3306/test --username sugata --password-alias mysql.testDB.alias --table "customer" --target-dir /user/sugata/test
Success output
Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/09/02 02:08:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.2.0-205 18/09/02 02:08:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 18/09/02 02:08:06 INFO manager.SqlManager: Using default fetchSize of 1000 18/09/02 02:08:06 INFO tool.CodeGenTool: Beginning code generation 18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0 18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0 18/09/02 02:08:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.2.0-205/hadoop-mapreduce Note: /tmp/sqoop-sheltong/compile/32c3e11ab1e1878e6ca7638a96feb30b/customer.java uses or overrides a deprecated API. Physical memory (bytes) snapshot=669270016 Virtual memory (bytes) snapshot=18275794944 Total committed heap usage (bytes)=331350016 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=243892 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Transferred 238.1758 KB in 218.8164 seconds (1.0885 KB/sec) 18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Retrieved 2170 records.
Sqoop import in hdfs
$ hdfs dfs -ls /user/sugata/test Found 5 items -rw-r--r-- 3 sugata hdfs 0 2019-06-09 02:11 /user/sugata/test/_SUCCESS -rw-r--r-- 3 sugata hdfs 60298 2019-06-09 02:10 /user/sugata/test/part-m-00000 -rw-r--r-- 3 sugata hdfs 60894 2019-06-09 02:10 /user/sugata/test/part-m-00001 -rw-r--r-- 3 sugata hdfs 62050 2019-06-09 02:11 /user/sugata/test/part-m-00002 -rw-r--r-- 3 sugata hdfs 60650 2019-06-09 02:11 /user/sugata/test/part-m-00003
So you have the solutions
Created 06-10-2019 10:51 AM
Hi Geoffrey, thanks for your thoughts.
i have implemented the jceks for encrypting passwords while sending the import request.
however, i want the rsponse, i.e the data to come through a secure channel from source to hadoop. like in http(s) we implement certificate based authentication, is anything similar there for sqoop?
note : am not having sqoop2
Created 06-10-2019 11:50 AM
If already have an SSL cert file, then you can generate you own JKS file and import your cert into your jks.
Run the following commands?
Assumption
Alias = sugatajks
SSL Cert= exadata.crt
keytool -genkey -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd} keytool -delete -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd} keytool -import -alias sugatajks -file /etc/pki/CA/certs/exadata.crt -keypass {hidden_passwd} -keystore exadata.jks -storepass {hidden_passwd}
Now run the import command
$ sqoop import --connect --connect "jdbc:mysql://hadoop.node1.com:3306/test username={hidden_username};password={hidden_passwd};encrypt=true;trustServerCertificate=false;trustStore=/exadata.jks" + {options}
e.g --table customer --fields-terminated-by , --escaped-by \\ --enclosed-by '"' --compress -m 1 --target-dir /user/sugata/ --append --hive-drop-import-delims -- --schema exadat --table-hints NOLOCK
Hope that gives you the idea.
Created 06-10-2019 01:52 PM
tried that..but getting below error, when am trying to put more parameters in connect option :
ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:553)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:254)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:326)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.main(Sqoop.java:243)
Caused by: oracle.net.ns.NetException: Invalid connection string format, a valid format is: "host:port:sid"
at oracle.net.resolver.AddrResolution.resolveSimple(AddrResolution.java:498)
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:437)
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:693)
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:251)
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1140)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:340)
Created 06-10-2019 02:26 PM
Based on the error you posted it looks like you are using Oracle DB and Oracle JDBC Driver instead of MySQL.
ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid" java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid" at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)
.
So you might want to try something like following. Please make sure that the Oracle JDBC URL is in correct format as following:
Example URL
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=serverne)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=servicename)))
Example Sqoop command
# sqoop import -Dmapreduce.map.java.opts="-Djavax.net.debug=all -Djavax.net.ssl.keyStore='/PATH/TO/YOUR_TRUST_STORE.jks' -Djavax.net.ssl.keyStorePassword=ZZZZZZZZ -Djavax.net.ssl.keyStoreType=JKS" --connect "jdbc:oracle:thin:@ORACLE_DB_HOST:1521:TESTDB" --username "AAAAAAAAAA" --password "BBBBBBBBB" --table "customer" --verbose
.
In order to get more information regarding "Oracle JDBC Connectivity Over SSL using Thin Driver" Please refer to the following link: https://www.oracle.com/technetwork/topics/wp-oracle-jdbc-thin-ssl-130128.pdf