<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Table formats in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223150#M60634</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15991/mmlr-90.html" nodeid="15991"&gt;@mÁRIO Rodrigues&lt;/A&gt; use &lt;A href="https://github.com/hortonworks/hive-testbench" target="_blank"&gt;https://github.com/hortonworks/hive-testbench&lt;/A&gt;. Default format is ORC.&lt;/P&gt;</description>
    <pubDate>Mon, 08 May 2017 08:36:07 GMT</pubDate>
    <dc:creator>SQLShaw</dc:creator>
    <dc:date>2017-05-08T08:36:07Z</dc:date>
    <item>
      <title>Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223149#M60633</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am Hive-testbench (&lt;A href="http://blog.moserit.com/benchmarking-hive"&gt;http://blog.moserit.com/benchmarking-hive&lt;/A&gt;) to test some queries. By default, using the ./tpcds-setup.sh 10 what is the file format will my hive tables have (since in hdfs they are listed with a .deflate extesion)? I think the best file formats for performance are either ORC orc parquet, how can i generate in those formats?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 07:40:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223149#M60633</guid>
      <dc:creator>mmlr_90</dc:creator>
      <dc:date>2017-05-08T07:40:16Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223150#M60634</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15991/mmlr-90.html" nodeid="15991"&gt;@mÁRIO Rodrigues&lt;/A&gt; use &lt;A href="https://github.com/hortonworks/hive-testbench" target="_blank"&gt;https://github.com/hortonworks/hive-testbench&lt;/A&gt;. Default format is ORC.&lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 08:36:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223150#M60634</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2017-05-08T08:36:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223151#M60635</link>
      <description>&lt;P&gt;Note: Parquet is supported for LLAP but will not by cached.&lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 08:37:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223151#M60635</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2017-05-08T08:37:39Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223152#M60636</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/15991/mmlr-90.html"&gt;@mÁRIO Rodrigues&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Deflate is not a format. But if the file is in compressed state then the extension of your file in HDFS will be mentioned as .defalte. As you stated ORC performance better during loading the table. Parquet and Avro also serves its own purpose. When I have tested a table with 3 billion records the time taken for loading a hive table with specific format were&lt;/P&gt;&lt;P&gt;ORC&lt;/P&gt;&lt;P&gt;Avro&lt;/P&gt;&lt;P&gt;Parquet. In ascending order of time taken. ORC being the least amount of time taken during loading. But if your file format is dynamic then its better to go with parquet/Avro. &lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 13:28:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223152#M60636</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2017-05-08T13:28:01Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223153#M60637</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/15991/mmlr-90.html" nodeid="15991"&gt;@mÁRIO Rodrigues&lt;/A&gt;&lt;P&gt;Refer to &lt;A href="https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines"&gt;blog&lt;/A&gt; for details&lt;/P&gt;</description>
      <pubDate>Mon, 08 May 2017 16:30:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223153#M60637</guid>
      <dc:creator>ssubhas</dc:creator>
      <dc:date>2017-05-08T16:30:31Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223154#M60638</link>
      <description>&lt;P&gt;Thank you all for the answers!&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 07:34:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223154#M60638</guid>
      <dc:creator>mmlr_90</dc:creator>
      <dc:date>2017-05-09T07:34:25Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223155#M60639</link>
      <description>&lt;P&gt;So although it presents itself as .deflate, basicly it's orc? Spark queries can query .parquet files, it will be able to query in these files with deflate format?&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 07:37:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223155#M60639</guid>
      <dc:creator>mmlr_90</dc:creator>
      <dc:date>2017-05-09T07:37:17Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Table formats</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223156#M60640</link>
      <description>&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/15991/mmlr-90.html"&gt;@mÁRIO Rodrigues&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/users/15991/mmlr-90.html"&gt;&lt;/A&gt;Yes even though if its expressed as .deflate its in ORC with compressed state. I thik you will be able to read the files through hive tables in Spark SQL but you cant use the underneath files in it as it is compressed. If you want to read the files then load the hive tables without any compression and then Spark can make use of that file underneath.&lt;/P&gt;</description>
      <pubDate>Tue, 09 May 2017 22:38:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Table-formats/m-p/223156#M60640</guid>
      <dc:creator>balavignesh_nag</dc:creator>
      <dc:date>2017-05-09T22:38:45Z</dc:date>
    </item>
  </channel>
</rss>

