<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Kakfa Spark Streaming Error in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198620#M160668</link>
    <description>&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/30206/rmy1712.html" nodeid="30206"&gt;@Ramya Jayathirtha&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I downloaded jar file from maven and executed,it went through this.&lt;/P&gt;&lt;P&gt;Now its blocked different error&lt;/P&gt;&lt;P&gt;File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main&lt;BR /&gt;  process()&lt;BR /&gt;  File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process&lt;BR /&gt;  serializer.dump_stream(func(split_index, iterator), outfile)&lt;BR /&gt;  File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream&lt;BR /&gt;  vs = list(itertools.islice(iterator, batch))&lt;BR /&gt;  File "/sourcefiles/spkf.py", line 9, in &amp;lt;lambda&amp;gt;&lt;BR /&gt;  parsed = kafkaStream.map(lambda v: json.loads(v[1]))&lt;BR /&gt;  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads&lt;BR /&gt;  return _default_decoder.decode(s)&lt;BR /&gt;  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode&lt;BR /&gt;  obj, end = self.raw_decode(s, idx=_w(s, 0).end())&lt;BR /&gt;  File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode&lt;BR /&gt;  raise ValueError("No JSON object could be decoded")&lt;BR /&gt;ValueError: No JSON object could be decoded&lt;/P&gt;</description>
    <pubDate>Wed, 31 Jan 2018 04:32:50 GMT</pubDate>
    <dc:creator>sreeviswa_athic</dc:creator>
    <dc:date>2018-01-31T04:32:50Z</dc:date>
    <item>
      <title>Kakfa Spark Streaming Error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198618#M160666</link>
      <description>&lt;P&gt;Hi Team,&lt;BR /&gt;I am trying to copy streaming data from Kafka topic to HDFS directory.&lt;BR /&gt;It is throwing an error 'java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper'&lt;BR /&gt;&lt;BR /&gt;Any kind of assistance to help me resolve is much appreciated.&lt;BR /&gt;&lt;BR /&gt;Here are the steps I followed&lt;BR /&gt;Step 1 - Created topic -&amp;gt; topicXYZ&lt;BR /&gt;STep 2 - created producer and linked to topicXYZ&lt;BR /&gt;Step 3 - created consumer and linked to topicXYZ&lt;BR /&gt;&lt;BR /&gt;=&amp;gt; pyspark program to stream and copy data to HDFS directory&lt;BR /&gt;from pyspark import SparkContext&lt;BR /&gt;from pyspark.streaming import StreamingContext&lt;BR /&gt;from pyspark.streaming.kafka import KafkaUtils&lt;BR /&gt;import json&lt;BR /&gt;sc = SparkContext(appName="kafksteststreaming")&lt;BR /&gt;sc.setLogLevel("WARN")&lt;BR /&gt;ssc = StreamingContext(sc, 60)  &lt;BR /&gt;kafkaStream = KafkaUtils.createStream(ssc, 'xxxx:2181', 'raw-event-streaming-consumer', {'topicXYZ':1})parsed = kafkaStream.map(lambda (k, v): json.loads(v)) &lt;BR /&gt;parsed.saveAsTextFiles('/tmp/folderxyz')&lt;BR /&gt;ssc.start() &lt;BR /&gt;ssc.awaitTermination()&lt;BR /&gt;&lt;BR /&gt;spark-submit  --jars /usr/hdp/current/spark-client/lib/spark-assembly-*.jar spkf.py&lt;/P&gt;&lt;P&gt;The above code is throwing error&lt;/P&gt;&lt;P&gt;Spark Streaming's Kafka libraries not found in class path. Try one
  of the following.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Include the Kafka library and its dependencies with in the          spark-submit command&lt;OL&gt;&lt;LI&gt;$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.6&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;Download the JAR of the artifact from Maven Central &lt;A href="http://search.maven.org/"&gt;http://search.maven.org/&lt;/A&gt;,
 Group Id = org.apache.spark, Artifact Id = 
spark-streaming-kafka-assembly, Version = 1.4.0. Then, include the jar 
in the spark-submit command as&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Wed, 31 Jan 2018 03:16:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198618#M160666</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2018-01-31T03:16:53Z</dc:date>
    </item>
    <item>
      <title>Re: Kakfa Spark Streaming Error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198619#M160667</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3057/sreeviswaathikala.html" nodeid="3057"&gt;@Viswa&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Add this configuration in your pom.xml under build tag, rather than adding jar in spark-submit.&lt;/P&gt;&lt;PRE&gt;&amp;lt;descriptorRefs&amp;gt;

&amp;lt;descriptorRef&amp;gt;jar-with-dependencies&amp;lt;/descriptorRef&amp;gt;

&amp;lt;/descriptorRefs&amp;gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 31 Jan 2018 03:47:46 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198619#M160667</guid>
      <dc:creator>rmy1712</dc:creator>
      <dc:date>2018-01-31T03:47:46Z</dc:date>
    </item>
    <item>
      <title>Re: Kakfa Spark Streaming Error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198620#M160668</link>
      <description>&lt;P&gt;Thank you &lt;A rel="user" href="https://community.cloudera.com/users/30206/rmy1712.html" nodeid="30206"&gt;@Ramya Jayathirtha&lt;/A&gt; &lt;/P&gt;&lt;P&gt;I downloaded jar file from maven and executed,it went through this.&lt;/P&gt;&lt;P&gt;Now its blocked different error&lt;/P&gt;&lt;P&gt;File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main&lt;BR /&gt;  process()&lt;BR /&gt;  File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process&lt;BR /&gt;  serializer.dump_stream(func(split_index, iterator), outfile)&lt;BR /&gt;  File "/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream&lt;BR /&gt;  vs = list(itertools.islice(iterator, batch))&lt;BR /&gt;  File "/sourcefiles/spkf.py", line 9, in &amp;lt;lambda&amp;gt;&lt;BR /&gt;  parsed = kafkaStream.map(lambda v: json.loads(v[1]))&lt;BR /&gt;  File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads&lt;BR /&gt;  return _default_decoder.decode(s)&lt;BR /&gt;  File "/usr/lib64/python2.7/json/decoder.py", line 366, in decode&lt;BR /&gt;  obj, end = self.raw_decode(s, idx=_w(s, 0).end())&lt;BR /&gt;  File "/usr/lib64/python2.7/json/decoder.py", line 384, in raw_decode&lt;BR /&gt;  raise ValueError("No JSON object could be decoded")&lt;BR /&gt;ValueError: No JSON object could be decoded&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2018 04:32:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198620#M160668</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2018-01-31T04:32:50Z</dc:date>
    </item>
    <item>
      <title>Re: Kakfa Spark Streaming Error</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198621#M160669</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/30206/rmy1712.html" nodeid="30206"&gt;@Ramya Jayathirtha&lt;/A&gt;&lt;/P&gt;&lt;P&gt;it worked&lt;/P&gt;&lt;P&gt;thank you for your timely response&lt;/P&gt;</description>
      <pubDate>Wed, 31 Jan 2018 05:34:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Kakfa-Spark-Streaming-Error/m-p/198621#M160669</guid>
      <dc:creator>sreeviswa_athic</dc:creator>
      <dc:date>2018-01-31T05:34:31Z</dc:date>
    </item>
  </channel>
</rss>

