Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

getting error while submitting spark job

avatar
Contributor

getting error while submitting spark job from command line

Spark Streaming's Kafka libraries not found in class path. Try one of the following. 1. Include the Kafka library and its dependencies with in the spark-submit command as $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.5.2 ... 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.5.2. Then, include the jar in the spark-submit command as $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...

the python code i am running is:

from pyspark.sql import SQLContext from pyspark import SparkContext, SparkConf from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import json sc = SparkContext(appName="Clickstream_kafka") stream = StreamingContext(sc, 2) kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1}) parsed = kafka_stream.map(lambda (k, v): json.loads(v)) print(parsed.collect()) stream.start() stream.awaitTermination()

1 ACCEPTED SOLUTION

avatar
Super Guru

with spark-submit option --jar, are you passing spark-kafka-assembly jar along with kafka_2.10-*.jar from /usr/hdp/2.4.0.0-169/kafka/libs/ location.

View solution in original post

3 REPLIES 3

avatar
Super Guru

with spark-submit option --jar, are you passing spark-kafka-assembly jar along with kafka_2.10-*.jar from /usr/hdp/2.4.0.0-169/kafka/libs/ location.

avatar
Contributor

i am only running like this -

spark-submit <file_name.py>

avatar
Contributor

The spark job ran fine now. I used

spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar <file.py>