Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎07-18-2016

Oozie Spark Action for Spark JDBC to Oracle

I need your help to set Oozie Spark action for Spark JDBC program.
 
Requirement – Read oracle and Hive tables and write transformed data to Oracle database using Spark JDBC.
 
Manual execution -
spark-submit --master yarn --class com.spark.sql.jdbc.SparkDFtoOracle --jars /home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path /home/user/oracle/spark-jdbc /lib/ojdbc7.jar /home/user/oracle/spark-jdbc/lib/testOracleJdbcDF.jar
 
--Manual execution is working fine.
 
Oozie job.properties file
 
nameNode=hdfs://<host>:8020
jobTracker=<host>:8021
master=yarn-cluster
queueName=default
examplesRoot=spark-jdbc
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/
 
Oozie workflow.xml
 
-->
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkJDBC'>
    <start to='spark-oracle' />
 
    <action name='spark-oracle'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/>
            </prepare>
            <master>${master}</master>
            <name>Spark-WriteOracle</name>
            <class> com.spark.sql.jdbc.SparkDFtoOracle</class>
            <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar </jar>
            <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
        </spark>
        <ok to="end" />
        <error to="fail" />
    </action>
 
    <kill name="fail">
        <message>Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
 
I have tried the below combinations in workflow.xml to set spark options but none of these settings are working.
 
1.       <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml --executor-memory 1G --driver-memory 1G --executor-cores 4 --num-executors 2 --jars hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</spark-opts>
 
2.       <spark-opts>--conf files=hdfs://server/user/oracle/hive/hive-site.xml jars=hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar driver-class-path=hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
3.       <spark-opts>--files=hdfs://server/user/oracle/hive/hive-site.xml --jars=hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path=hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
4.       <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml --jars hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
5.        <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar, ${nameNode}/user/${wf:user()}/${examplesRoot}/lib/ojdbc7.jar </jar>
        <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
 
6.       <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar</jar>
        <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
        <arg>--jars</arg>
        <arg>hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</arg>
        <arg>--driver-class-path</arg>
        <arg>hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</arg>
      
 
Oozie execution –
oozie job –oozie http://xyz.client.com:11000/oozie --config /home/myuserid/project_folder/job.properties –run
 
None of the above options are working for Spark Action. The below are the common thrown errors on each execution:-
 
JA018 – File file:/data/01/yarn/…………./container_12344/driver-class-path hdfs://server//home/user/oracle/spark-jdbc/lib/ojdbc7.jar does not exist (In actual this file exists on hdfs)
JA018 – File file:/data/01/yarn/…………./container_12344/”—files does not exist
JA018 – Invalid path name Path part /…./ testOracleJdbcDF.jar,/…/ojdbc7.jar from URI hdfs://server/user/lib/testOracleJdbcDF.jar,hdfs://server/user/lib/ojdbc7.jar is not valid filename.
 
 
Software versions:-
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
Spark version 1.5.0-cdh 5.5.1
Scala 2.10.4
Java 1.7.0
oracle jdbc driver - ojdbc7.jar
oozie 4.1.0

 

Could you please suggest the right way to setup the Oozie workflow.xml file for Spark JDBC for Oracle using driver ojdbc7.jar. I appreciate your quick help.

Explorer
Posts: 21
Registered: ‎09-09-2015

Re: Oozie Spark Action for Spark JDBC to Oracle

[ Edited ]

I know you are using spark 1.5. In spark 1.3 if such issue is raised, I would just keep the required jars under workflow lib folder and run workflow. Jar tag may not be required.

You can try this once.