Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark-streaming-kafka-assembly* jar file location on HDP 2.4

Highlighted

spark-streaming-kafka-assembly* jar file location on HDP 2.4

New Contributor

Hi Guys,

Can anyone tell me where are the spark-streaming-kafka-assembly* jar files located on HDP ? I am using pyspark to stream data from kafka producer to spark direct streaming. But to submit the python code to spark I need to provide the assemble jars.

4 REPLIES 4

Re: spark-streaming-kafka-assembly* jar file location on HDP 2.4

Re: spark-streaming-kafka-assembly* jar file location on HDP 2.4

New Contributor

Unfortunately no. That link provides the maven dependency information for scala/java projects. I have to use python and to submit python code to spark the jar location is needed.

Re: spark-streaming-kafka-assembly* jar file location on HDP 2.4

@Raghvendra Singh

I can use below two jars and run the spark streaming program in scala, I'm hoping these are same for pyspark also.

/usr/hdp/2.4.0.0-169/spark/lib/spark-examples-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar
/usr/hdp/2.4.0.0-169/spark/lib/spark-assembly-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar

What error you are getting while running pyspark?

Re: spark-streaming-kafka-assembly* jar file location on HDP 2.4

New Contributor

Scala doesn't need to provide the spark streaming jars, but python does.

here is the error that i am getting:

  Spark Streaming's Kafka libraries not found in class path. Try one of the following.                                                        

  1. Include the Kafka library and its dependencies with in the                                                                               
     spark-submit command as                                                                                                                  

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.6.0 ...                                                           

 2. Download the JAR of the artifact from Maven Central http://search.maven.org/, 
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.6.0.                                              
     Then, include the jar in the spark-submit command as                                                                                     

     $ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...                                                                       

________________________________________________________________________________________________                                              
Traceback (most recent call last):                                                                                                            
  File "/usr/hdp/current/spark-client/examples/src/main/python/streaming/direct_kafka_wordcount.py", line 44, in <module>                     
    kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": brokers})                                                      
  File "/usr/hdp/2.4.0.0-169/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 152, in createDirectStream                        
py4j.protocol.Py4JJavaError: An error occurred while calling o38.loadClass.                                                                   
: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper                                                   
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)                                                                             
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)                                                                             
        at java.security.AccessController.doPrivileged(Native Method)                                                                         
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)                                                                         
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)                                                                              
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)                                                                              
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                                                                        
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)                                                      
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                                              
        at java.lang.reflect.Method.invoke(Method.java:606)                                                                                   
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)                                                                       
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)                                                                 
        at py4j.Gateway.invoke(Gateway.java:259)                                                                                              
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)                                                               
        at py4j.commands.CallCommand.execute(CallCommand.java:79)                                                                             
        at py4j.GatewayConnection.run(GatewayConnection.java:209)                                                                             
        at java.lang.Thread.run(Thread.java:745)
Don't have an account?
Coming from Hortonworks? Activate your account here