<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: spark streaming json to hive in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208642#M170599</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/23208/hadoopuserhadoop.html" nodeid="23208"&gt;@Mark&lt;/A&gt; sure, here is the link to the pyspark network word count example:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py" target="_blank"&gt;https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;</description>
    <pubDate>Fri, 10 Aug 2018 19:15:49 GMT</pubDate>
    <dc:creator>falbani</dc:creator>
    <dc:date>2018-08-10T19:15:49Z</dc:date>
    <item>
      <title>spark streaming json to hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208638#M170595</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I am beginner to spark and wanted to do the below.&lt;/P&gt;&lt;P&gt;a port 55500 is trying to send jsons as a stream (ex: {"one":"1","two":"2"}{"three":"3","four":"4"}).&lt;/P&gt;&lt;P&gt;I have a orc table in hive with columns given below&lt;/P&gt;&lt;P&gt;one, two,three,four,spark_streaming_startingtime,spark_streaming_endingtime,partition_value&lt;/P&gt;&lt;P&gt;I want to load the streaming values in to hive orc table.&lt;/P&gt;&lt;P&gt;Can you please guide me how to achieve this.&lt;/P&gt;&lt;P&gt;Thank you for your support.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jul 2018 20:10:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208638#M170595</guid>
      <dc:creator>mark_hadoop</dc:creator>
      <dc:date>2018-07-31T20:10:22Z</dc:date>
    </item>
    <item>
      <title>Re: spark streaming json to hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208639#M170596</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/23208/hadoopuserhadoop.html" nodeid="23208"&gt;@Mark&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I suggest you take the NetworkWordCount example as starting point. Then to transform the stream rdd into dataframe I recommend you look into flatMap, as you can map single column RDD into multiple columns after parsing the json content of each object. Finally when saving to hdfs you should consider a good batch size/repartition to avoid having small files in hdfs.&lt;/P&gt;&lt;P&gt;1. The NetworkWordCount code in github is located here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala" target="_blank"&gt;https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;2. Here is an example of how to parse JSON using map and flatmap &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJson.scala" target="_blank"&gt;https://github.com/holdenk/learning-spark-examples/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJson.scala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;3. Saving Dataframe as ORC is very well documented. Just avoid writing small files as this will hurt namenode and your hdfs overall.&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;&lt;P&gt;*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jul 2018 20:44:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208639#M170596</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-07-31T20:44:53Z</dc:date>
    </item>
    <item>
      <title>Re: spark streaming json to hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208640#M170597</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/11048/falbani.html" nodeid="11048"&gt;@Felix Albani&lt;/A&gt;&lt;P&gt;Thank your for quick response, I will go through the given info&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jul 2018 20:51:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208640#M170597</guid>
      <dc:creator>mark_hadoop</dc:creator>
      <dc:date>2018-07-31T20:51:45Z</dc:date>
    </item>
    <item>
      <title>Re: spark streaming json to hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208641#M170598</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/11048/falbani.html" nodeid="11048"&gt;@Felix Albani&lt;/A&gt;&lt;P&gt;Can you help me with the pyspark version of the above please.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 16:08:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208641#M170598</guid>
      <dc:creator>mark_hadoop</dc:creator>
      <dc:date>2018-08-10T16:08:23Z</dc:date>
    </item>
    <item>
      <title>Re: spark streaming json to hive</title>
      <link>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208642#M170599</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/23208/hadoopuserhadoop.html" nodeid="23208"&gt;@Mark&lt;/A&gt; sure, here is the link to the pyspark network word count example:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py" target="_blank"&gt;https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py&lt;/A&gt;&lt;/P&gt;&lt;P&gt;HTH&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 19:15:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/spark-streaming-json-to-hive/m-p/208642#M170599</guid>
      <dc:creator>falbani</dc:creator>
      <dc:date>2018-08-10T19:15:49Z</dc:date>
    </item>
  </channel>
</rss>

