Support Questions

Find answers, ask questions, and share your expertise

org.apache.spark.examples.streaming.JavaKafkaWordCount example not working

avatar
Explorer

I am running Spark using the CDH5 client packages on Ubuntu 12.04. I was trying to get the JavaKafkaWordCount example working in the /usr/lib/spark/examples/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar running on YARN, but I got an error that I can't seem to resolve (as a side note, I was able to get the SparkPi.scala example to work). I noticed that the spark-streaming-kafka library is not available by default in the CDH5 packages, but I did find that it is available here: https://repository.cloudera.com/cloudera/public/org/apache/spark/spark-streaming-kafka_2.10/1.0.0-cd.... So, I downloaded the spark-streaming-kafka JAR file and I ran the following command:

 

spark-submit --jars spark-streaming-kafka_2.10-1.0.0-cdh5.1.2.jar --class org.apache.spark.examples.streaming.JavaKafkaWordCount spark-examples_2.10-1.0.0-cdh5.1.2.jar my_kafka_host:my_kafka_port my_consumer_group my_kafka_topic 1

 

 

but I got this error:

 

INFO cluster.YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils
        at org.apache.spark.examples.streaming.JavaKafkaWordCount.main(JavaKafkaWordCount.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 8 more

 

I also tried adding the spark-streaming-kafka JAR file to the 'spark.yarn.dist.files' directive in my 'spark-defaults.conf' file. That did not work either. Anyone know how to resolve this error?

1 ACCEPTED SOLUTION

avatar
Explorer

I found out that there were dependencies that were not fulfilled by the spark-examples_2.10-1.0.0-cdh5.1.2.jar file. As it was mentioned, the spark-streaming-kafka libraries were missing as well as the kafka libraries themselves. I zipped these jars up into a single zip file and used the --archives option and it is now working. Doesn't seem like it would be too much work to include the spark-streaming-kafka and kafka libraries with the spark-examples jar, but I have not tried to create an 'uber jar' with all the libraries.

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

Yes, I don't think these examples are part of the runtime platform. You would need to bring the examples with your app. This sounds like the streaming jars aren't somehow part of your distribution. Is there anything custom about your deployment of CDH? the streaming classes should be found as part of the distribution, and that's what's missing.

avatar
Explorer

I found out that there were dependencies that were not fulfilled by the spark-examples_2.10-1.0.0-cdh5.1.2.jar file. As it was mentioned, the spark-streaming-kafka libraries were missing as well as the kafka libraries themselves. I zipped these jars up into a single zip file and used the --archives option and it is now working. Doesn't seem like it would be too much work to include the spark-streaming-kafka and kafka libraries with the spark-examples jar, but I have not tried to create an 'uber jar' with all the libraries.

avatar
Expert Contributor

can you please post the specific dependencies you added ?