We are trying to import data from a remote oracle DB configured with SSO wallet using Apache Spark. We are able to configure the wallet and import the data successfully by using spark-submit in local[*] mode. Below is the command we have used
spark-submit --class com.example.test.TestMainClass \ --driver-class-path /path/to/oraclepki.jar:/path/to/osdt_cert.jar:/path/to/osdt_core.jar:/path/to/ojdbc8.jar \ --jars /path/to/oraclepki.jar,/path/to/ojdbc8.jar,/path/to/osdt_cert.jar,/path/to/osdt_core.jar \ --driver-java-options="-Djavax.net.ssl.trustStore=/path/to/cwallet.sso -Djavax.net.ssl.trustStoreType=SSO" \ --master local[*] --driver-memory 12G --conf spark.driver.maxResultSize=3g --executor-cores 4 --executor-memory 12G --num-executors 20 /path/to/application.jar
But when we specify --master yarn mode we are getting the below error
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, example.host.net, executor 20): java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:801) at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:782) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:39) ... ...<br>... Caused by: oracle.net.ns.NetException: Unable to initialize ssl context. at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:344) ... ... Caused by: oracle.net.ns.NetException: Unable to initialize the key store. at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617) ... ... Caused by: java.io.FileNotFoundException: /path/to/cwallet.sso (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195)
Is this because this wallet file location is not available on the worker nodes ? If that is the case how do we specify one. We do not have direct access to those nodes and using a HDFS path for the wallet is not working either.
How does spark handles such scenarios? Do we need to go to every node any copy the wallet file to specific directory?
I am very new to spark so would like to know if there is anything wrong here.
We are able to run the job using the spark local mode, but when using the --master yarn mode it is throwing the following exception
java.sql.SQLException: PKI classes not found. To use 'connect /' functionality, oraclepki.jar must be in the classpath: java.lang.NoClassDefFoundError: oracle/security/pki/OracleWallet at oracle.jdbc.driver.PhysicalConnection.getSecretStoreCredentials(PhysicalConnection.java:3058) at oracle.jdbc.driver.PhysicalConnection.parseUrl(PhysicalConnection.java:2823)
We have tried to use the --jars parameter and looks like spark is copying the files to the HDFS path as seen below
INFO Client: Uploading resource file:/path/to/jars/oraclepki.jar -> hdfs://xxxx/user/xxx/.sparkStaging/application_1545206151380_18813/oraclepki.jar
What are we missing here? How do I make sure that the worker classpath contains the required jar ?
Currently oracle only allow us to create the wallet file to a local path. Do we need to copy the SSO wallet file to all spark worker node ? If that is the case is there any other way other than manually go to every worker node and copy-paste them ?
We are running out of options here. any help would be highly appreciated
@Geoffery Shelton Okot, apology for the late reply. We are new to Spark and it took us some time to run few test cases before putting any comment in this forum.
We have tried copying the wallet file to HDFS path and it did not work. Below is the exception received:
Caused by: java.io.FileNotFoundException: hdfs:/user/example/.sparkStaging/application_1553474902547_6762/cwallet.sso (No such file or directory) at java.io.FileInputStream.open0(Native Method)
And here is the code sample we are using to create the DB connection using JDBC
spark.read.format("jdbc").option("url",JDBCURL).option("user", "DB_USER").option("oracle.net.wallet_location","(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=hdfs://user/example/.sparkStaging/application_1553474902547_6762/)))") ...
The value inside "DIRECTORY=hdfs://user/example/.sparkStaging/application_1553474902547_6762/" block is expected to be a local path and it can not recognize the "hdfs://" protocol and thorwing the error even if the file is there.
As an alternative approach we did the following
1) Run Spark in local mode : For this we set --master local[*] and below is how we specified the wallet directory location
"/local/path/to/wallet_dir/" indicates the directory with the wallet file and everything works fine.
2) Run Spark in yarn mode: This time we set --master yarn and use the same wallet directory path as above. But we got the following exception
Caused by: oracle.net.ns.NetException: Unable to initialize ssl context. at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:344) at oracle.net.nt.CustomSSLSocketFactory.getSSLContext(CustomSSLSocketFactory.java:305) ... 30 more Caused by: oracle.net.ns.NetException: Unable to initialize the key store. at oracle.net.nt.CustomSSLSocketFactory.getKeyManagerArray(CustomSSLSocketFactory.java:617) at oracle.net.nt.CustomSSLSocketFactory.createSSLContext(CustomSSLSocketFactory.java:322) ... 35 more Caused by: java.io.FileNotFoundException: /local/path/to/cwallet.sso (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138)
It looks like in the yarn mode whenever Spark is trying to establish the connection from the executor node it is failing as the wallet directory is not available across those nodes. We thought of copying the wallet directory to all the worker nodes and it works fine. But due to official/corporate policy we were told to find a different solution without copying the file to all nodes.
In order to figure out a solution without copying the wallet file we did the following.
In local mode If we specify the wallet file under --files params. and try to use the path returned by the following commad
scala> SparkFiles.getRootDirectory() res0: String = /tmp/spark-a396c1f4-ddad-4da7-a2f4-6f8c279b3a7b/userFiles-744cd2cb-23d4-410c-8085-dee3207749ce
The file is available under the /tmp path and it is able to create the connection. But in yarn mode the same is not true and it shows no files under that path.
So is there anything we are missing here? Is it at all possible to get the files in all worker nodes without copying them ? Why can't we see the files under "SparkFiles.getRootDirectory()" path in yarn mode ? Does it only reflects driver's location ?