Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive on spark fails with exception through oozie

Hive on spark fails with exception through oozie

Explorer


I am trying to run Hive with engine as spark instead of MR. The operation is a fairly simple one involving creation of a table followed by insertion into it. When i trigger the script from Hive shell, i can see the job running to completion.

 

When I schedule the same through oozie(in the same machine), the job does start but ends up failing instantly with the following error.

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, scala/collection/Iterable

 

java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:329)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:204)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10310)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:193)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:223)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:558)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1473)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1275)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:226)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:175)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:389)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:324)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:422)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:438)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:732)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:333)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:310)
.
.


Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 37 more

 

Upon adding the 'oozie.action.sharelib.for.spark' set to 'hive,spark' in the workflow xml, the above exception went off but threw the following error.

 

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster


[Available ShareLib]
hive
distcp
mapreduce-streaming
spark
oozie
hcatalog
hive2
sqoop
pig
I did check if the oozie shared libraries are available in the HDFS and they are(mentioned below).

 

Please suggest if I am missing something here. Let me know if more details are needed.