Created 06-09-2016 11:33 AM
getting error while submitting spark job from command line
Spark Streaming's Kafka libraries not found in class path. Try one of the following. 1. Include the Kafka library and its dependencies with in the spark-submit command as $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.5.2 ... 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.5.2. Then, include the jar in the spark-submit command as $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...
the python code i am running is:
from pyspark.sql import SQLContext from pyspark import SparkContext, SparkConf from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils import json sc = SparkContext(appName="Clickstream_kafka") stream = StreamingContext(sc, 2) kafka_stream = KafkaUtils.createStream(stream,"172.16.10.13:2181","raw-event-streaming-consumer",{"event":1}) parsed = kafka_stream.map(lambda (k, v): json.loads(v)) print(parsed.collect()) stream.start() stream.awaitTermination()
Created 06-09-2016 11:48 AM
with spark-submit option --jar, are you passing spark-kafka-assembly jar along with kafka_2.10-*.jar from /usr/hdp/2.4.0.0-169/kafka/libs/ location.
Created 06-09-2016 11:48 AM
with spark-submit option --jar, are you passing spark-kafka-assembly jar along with kafka_2.10-*.jar from /usr/hdp/2.4.0.0-169/kafka/libs/ location.
Created 06-09-2016 02:39 PM
i am only running like this -
spark-submit <file_name.py>
Created 06-09-2016 04:23 PM
The spark job ran fine now. I used
spark-submit --jars spark-assembly-1.5.2.2.3.4.7-4-hadoop2.7.1.2.3.4.7-4.jar,spark-streaming-kafka-assembly_2.10-1.6.1.jar <file.py>