Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark cluster managed by Yarn throws java.lang.ClassNotFoundException

Highlighted

Spark cluster managed by Yarn throws java.lang.ClassNotFoundException

Explorer

I am new to Yarn but have some experience with Spark stand alone master, I have recently installed a Yarn+Spark cluster using Ambari.

I have a spark program (program.jar) compiled to jar which relies on another jar to work (infra.jar).

I set the following configurations in Ambari:

spark.executor.extraClassPath=/root/infra.jar
spark.driver.extraClassPath=/root/infra.jar

I verified the file exists on all nodes and verified the configuration was pushed to $SPARK_HOME/conf/spark-defaults.conf on all nodes.

Actually I have also copied it to $SPARK_HOME/jars on all nodes.

I run the job using:

$SPARK_HOME/bin/spark-submit --master yarn --class com.MyMainClass /root/program.jar

I have the following env variables:

export HADOOP_CONF_DIR=/usr/hdp/2.6.3.0-235/hadoop/conf
export HADOOP_HOME=/usr/hdp/2.6.3.0-235/hadoop

I am getting an error:

Caused by: java.lang.ClassNotFoundException: com.MyPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

MyPartition class is located in infra.jar but for some reason it is not found.

This code looks like it is called in the Executor , also some code does run - which is the driver code.

P.S. I tried adding the jar manually either by --jars flag or by addJars it still fails with ClassNotFoundException(although for some weird reason on different class from infra.jar)

Don't have an account?
Coming from Hortonworks? Activate your account here