Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Oozie Spark Action for Spark JDBC to Oracle

Oozie Spark Action for Spark JDBC to Oracle

New Contributor

I need your help to set Oozie Spark action for Spark JDBC program.
 
Requirement – Read oracle and Hive tables and write transformed data to Oracle database using Spark JDBC.
 
Manual execution -
spark-submit --master yarn --class com.spark.sql.jdbc.SparkDFtoOracle --jars /home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path /home/user/oracle/spark-jdbc /lib/ojdbc7.jar /home/user/oracle/spark-jdbc/lib/testOracleJdbcDF.jar
 
--Manual execution is working fine.
 
Oozie job.properties file
 
nameNode=hdfs://<host>:8020
jobTracker=<host>:8021
master=yarn-cluster
queueName=default
examplesRoot=spark-jdbc
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/
 
Oozie workflow.xml
 
-->
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkJDBC'>
    <start to='spark-oracle' />
 
    <action name='spark-oracle'>
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/spark"/>
            </prepare>
            <master>${master}</master>
            <name>Spark-WriteOracle</name>
            <class> com.spark.sql.jdbc.SparkDFtoOracle</class>
            <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar </jar>
            <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
        </spark>
        <ok to="end" />
        <error to="fail" />
    </action>
 
    <kill name="fail">
        <message>Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
 
I have tried the below combinations in workflow.xml to set spark options but none of these settings are working.
 
1.       <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml --executor-memory 1G --driver-memory 1G --executor-cores 4 --num-executors 2 --jars hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</spark-opts>
 
2.       <spark-opts>--conf files=hdfs://server/user/oracle/hive/hive-site.xml jars=hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar driver-class-path=hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
3.       <spark-opts>--files=hdfs://server/user/oracle/hive/hive-site.xml --jars=hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path=hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
4.       <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml --jars hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar --driver-class-path hdfs://server/home/user/oracle/spark-jdbc/lib/ojdbc7.jar</spark-opts>
 
5.        <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar, ${nameNode}/user/${wf:user()}/${examplesRoot}/lib/ojdbc7.jar </jar>
        <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
 
6.       <jar>${nameNode}/user/${wf:user()}/${examplesRoot}/lib/testOracleJdbcDF.jar</jar>
        <spark-opts>--files hdfs://server/user/oracle/hive/hive-site.xml</spark-opts>
        <arg>--jars</arg>
        <arg>hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</arg>
        <arg>--driver-class-path</arg>
        <arg>hdfs://server/home/user/oracle/spark-jdbc /lib/ojdbc7.jar</arg>
      
 
Oozie execution –
oozie job –oozie http://xyz.client.com:11000/oozie --config /home/myuserid/project_folder/job.properties –run
 
None of the above options are working for Spark Action. The below are the common thrown errors on each execution:-
 
JA018 – File file:/data/01/yarn/…………./container_12344/driver-class-path hdfs://server//home/user/oracle/spark-jdbc/lib/ojdbc7.jar does not exist (In actual this file exists on hdfs)
JA018 – File file:/data/01/yarn/…………./container_12344/”—files does not exist
JA018 – Invalid path name Path part /…./ testOracleJdbcDF.jar,/…/ojdbc7.jar from URI hdfs://server/user/lib/testOracleJdbcDF.jar,hdfs://server/user/lib/ojdbc7.jar is not valid filename.
 
 
Software versions:-
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
Spark version 1.5.0-cdh 5.5.1
Scala 2.10.4
Java 1.7.0
oracle jdbc driver - ojdbc7.jar
oozie 4.1.0

 

Could you please suggest the right way to setup the Oozie workflow.xml file for Spark JDBC for Oracle using driver ojdbc7.jar. I appreciate your quick help.

1 REPLY 1
Highlighted

Re: Oozie Spark Action for Spark JDBC to Oracle

Explorer

I know you are using spark 1.5. In spark 1.3 if such issue is raised, I would just keep the required jars under workflow lib folder and run workflow. Jar tag may not be required.

You can try this once.

Don't have an account?
Coming from Hortonworks? Activate your account here