<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to insert parquet file to Kafka and pass them to HDFS/Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-them-to-HDFS/m-p/178341#M61400</link>
    <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/18391/mehdihosseinzadeh86.html" nodeid="18391"&gt;@Mehdi Hosseinzadeh&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;From the requirements prospective, following is the simplistic approach which will be inline with technologies which you proposed.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Read the data From HTTP using Spark Streaming job and write into &lt;A href="https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/09%20Write%20Output%20To%20Kafka.html"&gt;Kafka&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Read &amp;amp; process data from Kafka Topic as batches/stream save the data into HDFS as parquet / Avaro /ORC etc..&lt;/LI&gt;&lt;LI&gt;Build an external Tables in Hive(on top of the data which processed in step 2) so that data is available as and when it is placed in HDFS&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Accessing the data from external tables has been discussed &lt;A href="https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 22 May 2017 10:27:53 GMT</pubDate>
    <dc:creator>bkosaraju</dc:creator>
    <dc:date>2017-05-22T10:27:53Z</dc:date>
    <item>
      <title>How to insert parquet file to Kafka and pass them to HDFS/Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-them-to-HDFS/m-p/178340#M61399</link>
      <description>&lt;P&gt;I'm designing an architecture for our data infrastructure in our company. This infrastructure will get payment and user behavioral events from Http producer (Node JS).&lt;/P&gt;&lt;P&gt;I've planned to use Kafka and Hive and Spark and use Avro file format for Kafka and pass them using Kafka Connect to HDFS and hive. But I've read that Avro is not preferred file format to process with Spark and I should use Parquet instead.&lt;/P&gt;&lt;P&gt;Now the question is that how I can generate and pass Parquet to/from Kafka? I'm confused with lot of choices here. Any advice/resource is welcomed. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 May 2017 23:39:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-them-to-HDFS/m-p/178340#M61399</guid>
      <dc:creator>Mehdi_hosseinza</dc:creator>
      <dc:date>2017-05-18T23:39:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to insert parquet file to Kafka and pass them to HDFS/Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-them-to-HDFS/m-p/178341#M61400</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/18391/mehdihosseinzadeh86.html" nodeid="18391"&gt;@Mehdi Hosseinzadeh&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;From the requirements prospective, following is the simplistic approach which will be inline with technologies which you proposed.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Read the data From HTTP using Spark Streaming job and write into &lt;A href="https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/09%20Write%20Output%20To%20Kafka.html"&gt;Kafka&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Read &amp;amp; process data from Kafka Topic as batches/stream save the data into HDFS as parquet / Avaro /ORC etc..&lt;/LI&gt;&lt;LI&gt;Build an external Tables in Hive(on top of the data which processed in step 2) so that data is available as and when it is placed in HDFS&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Accessing the data from external tables has been discussed &lt;A href="https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 May 2017 10:27:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-insert-parquet-file-to-Kafka-and-pass-them-to-HDFS/m-p/178341#M61400</guid>
      <dc:creator>bkosaraju</dc:creator>
      <dc:date>2017-05-22T10:27:53Z</dc:date>
    </item>
  </channel>
</rss>

