Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Oozie work flow not finding main class

avatar
Explorer

Hello Community,

 

I am going through this tutorial https://www.cloudera.com/tutorials/setting-up-a-spark-development-environment-with-java.html and I am trying to submit it through Oozie work flow. In Hue I am going to Query > Scheduler > Workflow then dropping Java program to actions and then uploading jar file then adding the main class which is Hortonworks.SparkTutorial.Main. When I click to run the Oozie work flow job I keep getting the error of: "Caused by: java.lang.ClassNotFoundException: Class Hortonwork.SparkTutorial.Main not found". I'm using intelliJ to do this project so I hold down ctrl and hover over Main and it takes me to MANIFEST.MF file and it says Main-Class: Hortonworks.SparkTutorial.Main so I'm getting the main class definition right I feel like. I cannot figure out why it is saying it can't find my class.

 

 

10 REPLIES 10

avatar

Hi @jarededrake , The "ClassNotFoundException: Class Hortonwork.SparkTutorial.Main not found" suggests that in the Java program's main class package name might have a typo (in your workflow definiton), the Hortonwork should be Hortonworks. Can you check that?

avatar
Explorer

@mszurap You were right I didn't have the "s" at the end 😐. However I am still getting the problem on oozie, I've add a screen shot so you can see the structure of my program and you can see in the MANIFEST.MF file it has my Main class as 

Hortonworks.SparkTutorial.Main

Capture9.PNG

avatar

I see. Have you verified that the built jar contains this package structure and class names? Can you also show where the jar is uploaded and how is it referenced in the oozie workflow?

Thanks, Miklos

avatar
Explorer

I can yes I will get those screen shots to you

avatar
Explorer

So this is interesting, I run "mvn package" to package my application into a jar and I get two different jars I get a SparkTutorial.jar. When I look at the contents of that jar file I only see my dependencies but I do not see my main class 

 

I run: jar tf C:\Users\drakej\Desktop\SparkTutorial\out\artifacts\SparkTutorial_jar\SparkTutorial.jar.

 

Sample of the output is below, it only lists my dependencies. 

 

org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyRequestProto$Builder.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyRequestProto.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyRequestProtoOrBuilder.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyResponseProto$1.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyResponseProto$Builder.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyResponseProto.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos$TransitionToStandbyResponseProtoOrBuilder.class
org/apache/hadoop/ha/proto/HAServiceProtocolProtos.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$1.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveRequestProto$1.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveRequestProto$Builder.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveRequestProto.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveRequestProtoOrBuilder.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveResponseProto$1.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveResponseProto$Builder.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveResponseProto.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$CedeActiveResponseProtoOrBuilder.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$GracefulFailoverRequestProto$1.class
org/apache/hadoop/ha/proto/ZKFCProtocolProtos$GracefulFailoverRequestProto$Builder.class

 

The second jar file that is made is SparkTutorial-1.0-SNAPSHOT.jar, I run 

 

jar tf C:\Users\drakej\Desktop\SparkTutorial\target\SparkTutorial-1.0-SNAPSHOT.jar

Then there it has my class listed, however when I run this jar file in oozie I get the error:

 

org.apache.oozie.action.hadoop.JavaMainException: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf

 So one jar has my dependencies and another jar has my class but they are not together. 

avatar
Explorer

Forgot to add the contents of 

 

jar tf C:\Users\drakej\Desktop\SparkTutorial\target\SparkTutorial-1.0-SNAPSHOT.jar

 

META-INF/
META-INF/MANIFEST.MF
Hortonworks/
Hortonworks/SparkTutorial/
code.txt
Hortonworks/SparkTutorial/Main.class
Main.class
replacementValues.properties
shakespeareText.txt
ULAN-Test-IPSummary.csv
ULAN-Test-IPSummary.txt
META-INF/maven/
META-INF/maven/hortonworks/
META-INF/maven/hortonworks/SparkTutorial/
META-INF/maven/hortonworks/SparkTutorial/pom.xml
META-INF/maven/hortonworks/SparkTutorial/pom.properties

avatar

Hi @jarededrake , sorry for the delay, I was away for a couple of days.

You should use your thin jar (application only - without the dependencies) in the target directory ("SparkTutorial-1.0-SNAPSHOT.jar"). The NoClassDefFoundError for the SparkConf suggests that you've tried a Java action. It is highly suggested to use a Spark action in Oozie workflow editor when running a Spark application to make sure that the environment is set up properly for the application.

avatar
Explorer

Hey @mszurap , no problem at all totally understand.  So I ran my "SparkTutorial-1.0-SNAPSHOT.jar" from Hue > Query > Scheduler > Workflow, I then drag down Spark to the work flow, I then fill out the inputs like this: 

Jar/py name: SParkTutorial-10.0.SNAPSHOT.jar

Main class: Main 

Files + : /user/zzmdrakej2/SparkTutorial-1.0-SNAPSHOT.jar

 

I then get this error, its different so I guess that's a good sign right: 

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Delegation Token can be issued only with kerberos or web authentication

 

 

avatar

Hi @jarededrake , that's a good track, the issue currently seems to be that the cluster has Kerberos enabled, and that needs an extra configuration.

In the workflow editor, in the right upper corner of the Spark action you will find a cogwheel icon for advanced settings. There on the Credentials tab enable the "hcat" and "hbase" credentials to let the Spark client obtain delegation tokens for the Hive (Hive metastore) and HBase services - in case the spark application wants to use those services (Spark does not know this in advance, so it obtains those DTs). You can disable this behavior too if you are sure that the Spark applicatino will not connect to Hive (using Spark SQL) or HBase, just add the following to the Spark action option list:

--conf spark.security.credentials.hadoopfs.enabled=false --conf spark.security.credentials.hbase.enabled=false --conf spark.security.credentials.hive.enabled=false

but it's easier to just enable these credentials in the settings page.

For similar Kerberos related issues in other actions, please see the following guide:

https://gethue.com/hadoop-tutorial-oozie-workflow-credentials-with-a-hive-action-with-kerberos/