Support Questions

Find answers, ask questions, and share your expertise

NoClassDefFoundError: org/apache/spark/Logging - Using spark-submit with Kafka Spark Streaming program

avatar
New Contributor

I built a jar using the code from:

https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/stream...

I'm using the KafkaWordCount class for execution.

My code is :

import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._


object KafkaWordCount {
  def main(args: Array[String]) {
    if (args.length < 4) {
      System.err.println("Usage: KafkaWordCount <zkQuorum> <group> <topics> <numThreads>")
      System.exit(1)
    }


    val Array(zkQuorum, group, topics, numThreads) = args
    val sparkConf = new SparkConf().setAppName("KafkaWordCount")
    val ssc = new StreamingContext(sparkConf, Seconds(1))
    ssc.checkpoint("checkpoint")


    val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
    val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1L))
      .reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
    wordCounts.print()


    ssc.start()
    ssc.awaitTermination()
  }
}

I'm using Spark 2.0 and packaging using sbt.

My build.sbt :

libraryDependencies ++= Seq(
        "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.2",
        "org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
        "org.apache.spark" %% "spark-core" % "2.0.0" % "provided"
)

The error I'm getting when running spark-submit is :

./bin/spark-submit --class KafkaWordCount --master spark://nifi:7077 KafkaWordCount-assembly-0.0.1.jar kafka01:9092 mycons-group kafka_scala_test 1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
       	at java.lang.ClassLoader.defineClass1(Native Method)
       	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
       	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
       	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
       	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
       	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
       	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
       	at java.security.AccessController.doPrivileged(Native Method)
       	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
       	at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91)
       	at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66)
       	at KafkaWordCount$.main(KafkaWordCount.scala:18)
       	at KafkaWordCount.main(KafkaWordCount.scala)
       	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
       	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       	at java.lang.reflect.Method.invoke(Method.java:498)
       	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
       	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
       	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
       	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
       	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
       	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
       	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
       	... 24 more

I realize that the org.apache.spark.Logging class was removed in a recent version of Spark. However, instead of downgrading to Spark 1.4.2, is there a change to make this compatible with Spark 2.0? I've been struggling with this for a while.

3 REPLIES 3

avatar
Master Guru

You need to compile and run your spark job with an SBT that points to the same version of Spark and same version of Scala. You must be using Scala 2.10 and make the code appropriately. 2.0.0 doesn't make sense in that SBT.

  1. "org.apache.spark"%"spark-streaming-kafka_2.10"%"1.6.2",
  2. "org.apache.spark"%%"spark-streaming"%"2.0.0"%"provided",
  3. "org.apache.spark"%%"spark-core"%"2.0.0"%"provided"

Also it must be the same as your server, HDP 2.5 runs Spark 1.6.2 and a tech preview of Spark 2.0. For spark 2.0 you must set an environment variable

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_introdu...

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_command-line-installation/content/install...

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/run-sample-...

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/ch_introdu...

https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html

Change your SBT

libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.2"
groupId = org.apache.spark
artifactId = spark-core_2.10
version = 1.6.2

avatar
New Contributor

Thanks for the help @Timothy Spann

avatar
New Contributor

Hello @Timothy Spann,

I'm currently trying to use RabbitMQ Spark Streaming receiver with Spark 2.0.0, but since this receiver needs that Logging class from the previous versions of Spark, I'm having problems with it. Do you know if there is any way to import that class to use it with Spark 2.0.0?

Sorry for the off-topic question.