Created 09-25-2016 11:33 PM
I built a jar using the code from:
I'm using the KafkaWordCount class for execution.
My code is :
import org.apache.spark.SparkConf import org.apache.spark.streaming._ import org.apache.spark.streaming.kafka._ object KafkaWordCount { def main(args: Array[String]) { if (args.length < 4) { System.err.println("Usage: KafkaWordCount <zkQuorum> <group> <topics> <numThreads>") System.exit(1) } val Array(zkQuorum, group, topics, numThreads) = args val sparkConf = new SparkConf().setAppName("KafkaWordCount") val ssc = new StreamingContext(sparkConf, Seconds(1)) ssc.checkpoint("checkpoint") val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1L)) .reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2) wordCounts.print() ssc.start() ssc.awaitTermination() } }
I'm using Spark 2.0 and packaging using sbt.
My build.sbt :
libraryDependencies ++= Seq( "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.6.2", "org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided", "org.apache.spark" %% "spark-core" % "2.0.0" % "provided" )
The error I'm getting when running spark-submit is :
./bin/spark-submit --class KafkaWordCount --master spark://nifi:7077 KafkaWordCount-assembly-0.0.1.jar kafka01:9092 mycons-group kafka_scala_test 1
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at java.net.URLClassLoader.access$100(URLClassLoader.java:73) at java.net.URLClassLoader$1.run(URLClassLoader.java:368) at java.net.URLClassLoader$1.run(URLClassLoader.java:362) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:361) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:91) at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:66) at KafkaWordCount$.main(KafkaWordCount.scala:18) at KafkaWordCount.main(KafkaWordCount.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 24 more
I realize that the org.apache.spark.Logging class was removed in a recent version of Spark. However, instead of downgrading to Spark 1.4.2, is there a change to make this compatible with Spark 2.0? I've been struggling with this for a while.
Created 09-26-2016 01:15 PM
You need to compile and run your spark job with an SBT that points to the same version of Spark and same version of Scala. You must be using Scala 2.10 and make the code appropriately. 2.0.0 doesn't make sense in that SBT.
Also it must be the same as your server, HDP 2.5 runs Spark 1.6.2 and a tech preview of Spark 2.0. For spark 2.0 you must set an environment variable
https://spark.apache.org/docs/1.6.2/streaming-programming-guide.html
Change your SBT
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.2"
groupId = org.apache.spark
artifactId = spark-core_2.10
version = 1.6.2
Created 09-27-2016 04:37 AM
Thanks for the help @Timothy Spann
Created 10-05-2016 06:49 PM
Hello @Timothy Spann,
I'm currently trying to use RabbitMQ Spark Streaming receiver with Spark 2.0.0, but since this receiver needs that Logging class from the previous versions of Spark, I'm having problems with it. Do you know if there is any way to import that class to use it with Spark 2.0.0?
Sorry for the off-topic question.