Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Using SSL for Sqoop Import

avatar
Explorer

Am trying to import data from an exadata instance. as the data contains sensitive information, i need to secure in-flight data transfer.

is there any way, by which i can use SSL with sqoop? or what could be other solution?

5 REPLIES 5

avatar
Master Mentor

@sugata kar

Yes you can use SSL/TLS with sqoop but you have to do a couple of configurations and configure the keystore see Sqoop 2 shell support for TLS/SSL


With Sqoop 1.4.5 you can use Hadoop credential provider API The CredentialProvider API in Hadoop allows for the separation of applications and how they store their required passwords/secrets see below example

# Encrypting SQOOP password

Generating the jceks file, the password should be the database

$ hadoop credential create mysql.testDB.alias -provider jceks://hdfs/user/sugata/mysql.testDB.password.jceks
Enter password:
Enter password again:
mysql.testDB.alias has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

Validating the creation

$ hdfs dfs -ls /user/sugata
Found 1 items
-rwx------   3 sheltong hdfs        503 2019-06-09 01:40 /user/sugata/mysql.testDB.password.jceks

Running the sqoop with the jceks alias

$ sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/sugata/mysql.testDB.password.jceks --driver com.mysql.jdbc.Driver --connect jdbc:mysql://hadoop.node1.com:3306/test --username sugata --password-alias mysql.testDB.alias --table "customer" --target-dir /user/sugata/test

Success output

Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/09/02 02:08:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.6.2.0-205
18/09/02 02:08:06 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
18/09/02 02:08:06 INFO manager.SqlManager: Using default fetchSize of 1000
18/09/02 02:08:06 INFO tool.CodeGenTool: Beginning code generation
18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0
18/09/02 02:08:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM customer AS t WHERE 1=0
18/09/02 02:08:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.6.2.0-205/hadoop-mapreduce
Note: /tmp/sqoop-sheltong/compile/32c3e11ab1e1878e6ca7638a96feb30b/customer.java uses or overrides a deprecated API.

                Physical memory (bytes) snapshot=669270016
                Virtual memory (bytes) snapshot=18275794944
                Total committed heap usage (bytes)=331350016
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=243892
18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Transferred 238.1758 KB in 218.8164 seconds (1.0885 KB/sec)
18/09/02 02:11:48 INFO mapreduce.ImportJobBase: Retrieved 2170 records.


Sqoop import in hdfs

$ hdfs dfs -ls /user/sugata/test
Found 5 items
-rw-r--r--   3 sugata hdfs          0 2019-06-09 02:11 /user/sugata/test/_SUCCESS
-rw-r--r--   3 sugata hdfs      60298 2019-06-09 02:10 /user/sugata/test/part-m-00000
-rw-r--r--   3 sugata hdfs      60894 2019-06-09 02:10 /user/sugata/test/part-m-00001
-rw-r--r--   3 sugata hdfs      62050 2019-06-09 02:11 /user/sugata/test/part-m-00002
-rw-r--r--   3 sugata hdfs      60650 2019-06-09 02:11 /user/sugata/test/part-m-00003


So you have the solutions

avatar
Explorer

Hi Geoffrey, thanks for your thoughts.

i have implemented the jceks for encrypting passwords while sending the import request.

however, i want the rsponse, i.e the data to come through a secure channel from source to hadoop. like in http(s) we implement certificate based authentication, is anything similar there for sqoop?

note : am not having sqoop2

avatar
Master Mentor

@sugata kar

If already have an SSL cert file, then you can generate you own JKS file and import your cert into your jks.

Run the following commands?

Assumption

Alias = sugatajks

SSL Cert= exadata.crt

keytool -genkey -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd}
keytool -delete -alias sugatajks -keystore exadata.jks -storepass {hidden_passwd}
keytool -import -alias sugatajks -file /etc/pki/CA/certs/exadata.crt -keypass {hidden_passwd} -keystore exadata.jks -storepass {hidden_passwd}

Now run the import command


$ sqoop import --connect --connect "jdbc:mysql://hadoop.node1.com:3306/test
username={hidden_username};password={hidden_passwd};encrypt=true;trustServerCertificate=false;trustStore=/exadata.jks" + {options}

e.g --table customer --fields-terminated-by , --escaped-by \\ --enclosed-by '"' --compress -m 1 --target-dir /user/sugata/ --append --hive-drop-import-delims -- --schema exadat --table-hints NOLOCK


Hope that gives you the idea.

avatar
Explorer

tried that..but getting below error, when am trying to put more parameters in connect option :


ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"

java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"

at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)

at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:553)

at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:254)

at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)

at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:528)

at java.sql.DriverManager.getConnection(DriverManager.java:664)

at java.sql.DriverManager.getConnection(DriverManager.java:270)

at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:326)

at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)

at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)

at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)

at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)

at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)

at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)

at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328)

at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1853)

at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1653)

at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)

at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:488)

at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:615)

at org.apache.sqoop.Sqoop.run(Sqoop.java:147)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)

at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:225)

at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)

at org.apache.sqoop.Sqoop.main(Sqoop.java:243)

Caused by: oracle.net.ns.NetException: Invalid connection string format, a valid format is: "host:port:sid"

at oracle.net.resolver.AddrResolution.resolveSimple(AddrResolution.java:498)

at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:437)

at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:693)

at oracle.net.ns.NSProtocol.connect(NSProtocol.java:251)

at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1140)

at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:340)


avatar
Master Mentor

@sugata kar

Based on the error you posted it looks like you are using Oracle DB and Oracle JDBC Driver instead of MySQL.

ERROR manager.SqlManager: Error executing statement: java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
java.sql.SQLRecoverableException: IO Error: Invalid connection string format, a valid format is: "host:port:sid"
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:489)


.


So you might want to try something like following. Please make sure that the Oracle JDBC URL is in correct format as following:

Example URL

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=serverne)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=servicename)))


Example Sqoop command

# sqoop import  -Dmapreduce.map.java.opts="-Djavax.net.debug=all -Djavax.net.ssl.keyStore='/PATH/TO/YOUR_TRUST_STORE.jks' -Djavax.net.ssl.keyStorePassword=ZZZZZZZZ  -Djavax.net.ssl.keyStoreType=JKS"  --connect "jdbc:oracle:thin:@ORACLE_DB_HOST:1521:TESTDB" --username "AAAAAAAAAA" --password "BBBBBBBBB" --table "customer" --verbose  

.

In order to get more information regarding "Oracle JDBC Connectivity Over SSL using Thin Driver" Please refer to the following link: https://www.oracle.com/technetwork/topics/wp-oracle-jdbc-thin-ssl-130128.pdf