Support Questions

Find answers, ask questions, and share your expertise

Import data from Oracle using Spark with Oracle wallet

avatar
Contributor

Hi,

We are trying to import data from a remote oracle DB configured with SSO wallet using Apache Spark. We are able to configure the wallet and import the data successfully by using spark-submit in local[*] mode. Below is the command we have used

spark-submit --class com.example.test.TestMainClass \
--driver-class-path /path/to/oraclepki.jar:/path/to/osdt_cert.jar:/path/to/osdt_core.jar:/path/to/ojdbc8.jar \
--jars /path/to/oraclepki.jar,/path/to/ojdbc8.jar,/path/to/osdt_cert.jar,/path/to/osdt_core.jar \
--driver-java-options="-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO"  \
--master local[*] --driver-memory 12G --conf spark.driver.maxResultSize=3g --executor-cores 4 --executor-memory 12G --num-executors 20  /path/to/application.jar

But when we specify --master yarn mode we are getting the below error

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, example.host.net, executor 20): java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:801)
        at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:782)
        at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:39)

...
...<br>...
Caused by: oracle.net.ns.NetException: Unable to initialize ssl context.
        at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:344)
...
...
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
        at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617)

...
...
Caused by: java.io.FileNotFoundException: /path/to/cwallet.sso (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)


Is this because this wallet file location is not available on the worker nodes ? If that is the case how do we specify one. We do not have direct access to those nodes and using a HDFS path for the wallet is not working either.

How does spark handles such scenarios? Do we need to go to every node any copy the wallet file to specific directory?

I am very new to spark so would like to know if there is anything wrong here.

UPDATE

We are able to run the job using the spark local mode, but when using the --master yarn mode it is throwing the following exception

 java.sql.SQLException: PKI classes not found. To use 'connect /' functionality, oraclepki.jar must be in the classpath: java.lang.NoClassDefFoundError: oracle/security/pki/OracleWallet
        at oracle.jdbc.driver.PhysicalConnection.getSecretStoreCredentials(PhysicalConnection.java:3058)
        at oracle.jdbc.driver.PhysicalConnection.parseUrl(PhysicalConnection.java:2823)


We have tried to use the --jars parameter and looks like spark is copying the files to the HDFS path as seen below

 INFO Client: Uploading resource file:/path/to/jars/oraclepki.jar -> hdfs://xxxx/user/xxx/.sparkStaging/application_1545206151380_18813/oraclepki.jar

What are we missing here? How do I make sure that the worker classpath contains the required jar ?

Currently oracle only allow us to create the wallet file to a local path. Do we need to copy the SSO wallet file to all spark worker node ? If that is the case is there any other way other than manually go to every worker node and copy-paste them ?

We are running out of options here. any help would be highly appreciated

Please advice

2 REPLIES 2

avatar
Master Mentor

@Amardeep Sarkar

Have you tried copying the wallet to hdfs ?

avatar
Contributor

@Geoffery Shelton Okot, apology for the late reply. We are new to Spark and it took us some time to run few test cases before putting any comment in this forum.

We have tried copying the wallet file to HDFS path and it did not work. Below is the exception received:

Caused by: java.io.FileNotFoundException: hdfs:/user/example/.sparkStaging/application_1553474902547_6762/cwallet.sso (No such file or directory)
  at java.io.FileInputStream.open0(Native Method)

And here is the code sample we are using to create the DB connection using JDBC

spark.read.format("jdbc").option("url",JDBCURL).option("user", "DB_USER").option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=hdfs://user/example/.sparkStaging/application_1553474902547_6762/)))")
...

The value inside "DIRECTORY=hdfs://user/example/.sparkStaging/application_1553474902547_6762/" block is expected to be a local path and it can not recognize the "hdfs://" protocol and thorwing the error even if the file is there.

Alternative approaches:

As an alternative approach we did the following


1) Run Spark in local mode : For this we set --master local[*] and below is how we specified the wallet directory location

option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/local/path/to/wallet_dir/)))")

"/local/path/to/wallet_dir/" indicates the directory with the wallet file and everything works fine.

2) Run Spark in yarn mode: This time we set --master yarn and use the same wallet directory path as above. But we got the following exception

Caused by: oracle.net.ns.NetException: Unable to initialize ssl context.
        at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:344)
        at oracle.net.nt.CustomSSLSocketFactory.getSSLContext(CustomSSLSocketFactory.java:305)
        ... 30 more
Caused by: oracle.net.ns.NetException: Unable to initialize the key store.
        at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617)
        at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:322)
        ... 35 more
Caused by: java.io.FileNotFoundException: /local/path/to/cwallet.sso (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)

It looks like in the yarn mode whenever Spark is trying to establish the connection from the executor node it is failing as the wallet directory is not available across those nodes. We thought of copying the wallet directory to all the worker nodes and it works fine. But due to official/corporate policy we were told to find a different solution without copying the file to all nodes.

In order to figure out a solution without copying the wallet file we did the following.

In local mode If we specify the wallet file under --files params. and try to use the path returned by the following commad

scala> SparkFiles.getRootDirectory()
res0: String = /tmp/spark-a396c1f4-ddad-4da7-a2f4-6f8c279b3a7b/userFiles-744cd2cb-23d4-410c-8085-dee3207749ce

The file is available under the /tmp path and it is able to create the connection. But in yarn mode the same is not true and it shows no files under that path.

So is there anything we are missing here? Is it at all possible to get the files in all worker nodes without copying them ? Why can't we see the files under "SparkFiles.getRootDirectory()" path in yarn mode ? Does it only reflects driver's location ?


Kindly advice