Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to run SparkSQL (that accesses Hive) jobs with Oozie?

How to run SparkSQL (that accesses Hive) jobs with Oozie?

New Contributor

I all,

in the last few days I coded SparkSQL (v 1.3.0) that accesses Hive tables.

On my CDH 5.4.7, when I run the job with the command line spark-submit the job runs perfectly.

 

But I need to run this job through Oozie 4.1, so I created an Oozie workflow with Hue. Here my spark action:

 

 <action name="spark-339a">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name></name>
                    <value></value>
                </property>
            </configuration>
            <master>yarn-client</master>
            <mode>client</mode>
            <name>MySpark</name>
              <class>com.pinto.Application</class>
            <jar>/var/lib/hue/dwh-cloudera-etl.jar</jar>
              <arg></arg>
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>

 Now, running the Oozie workflow I have this error:

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, 
org/apache/hadoop/hive/conf/HiveConf java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf

I've read a lot of posts with this issue with several suggested solutions. Apparently the better one suggests to set the following workflow variable:

 

 

oozie.action.sharelib.for.spark spark,hcatalog

but now, I have a new exception:

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, java.lang.RuntimeException: 
Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
  java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
...
Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/RecognitionException at java.lang.Class.forName0(Native Method)

 

Can someone suggests me the best way to instrument the CLASSPATH with the missing Hive jars?

 

Thank you in advance!

 

Michele