<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ORC vs Parquet - When to use one over the other in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95942#M9364</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;While ORC and Parquet are both columnar data stores that are supported in HDP, I was wondering if there was additional guidance on when to use one over the other? Or things to consider before choosing which format to use?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Andrew&lt;/P&gt;</description>
    <pubDate>Sat, 24 Oct 2015 21:06:02 GMT</pubDate>
    <dc:creator>awatson</dc:creator>
    <dc:date>2015-10-24T21:06:02Z</dc:date>
    <item>
      <title>ORC vs Parquet - When to use one over the other</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95942#M9364</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;While ORC and Parquet are both columnar data stores that are supported in HDP, I was wondering if there was additional guidance on when to use one over the other? Or things to consider before choosing which format to use?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Andrew&lt;/P&gt;</description>
      <pubDate>Sat, 24 Oct 2015 21:06:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95942#M9364</guid>
      <dc:creator>awatson</dc:creator>
      <dc:date>2015-10-24T21:06:02Z</dc:date>
    </item>
    <item>
      <title>Re: ORC vs Parquet - When to use one over the other</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95943#M9365</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35978"&gt;@awatson&lt;/a&gt;@hortownorks.com&lt;/P&gt;&lt;P&gt; This blog is very useful. I share it with customers and prospects &lt;A target="_blank" href="http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/" rel="nofollow noopener noreferrer"&gt;link&lt;/A&gt; &lt;/P&gt;&lt;P&gt;This focus on efficiency leads to some &lt;STRONG&gt;&lt;EM&gt;impressive compression ratios&lt;/EM&gt;&lt;/STRONG&gt;. This picture shows the sizes of the TPC-DS dataset at Scale 500 in various encodings. This dataset contains randomly generated data including strings, floating point and integer data.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="326-orcfile.png" style="width: 1323px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/24031i167E954AC23D8C2E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="326-orcfile.png" alt="326-orcfile.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Very well written - &lt;A target="_blank" href="http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy" rel="nofollow noopener noreferrer"&gt;link&lt;/A&gt;&lt;/P&gt;&lt;P&gt;One thing to Note: Parquet default compression is SNAPPY.&lt;/P&gt;&lt;P&gt;This is not official statement. Based on aggressive testing in one of the environments&lt;/P&gt;&lt;P&gt;ORC+Zlib has better performance than Paqruet + Snappy&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 12:56:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95943#M9365</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2019-08-19T12:56:09Z</dc:date>
    </item>
    <item>
      <title>Re: ORC vs Parquet - When to use one over the other</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95944#M9366</link>
      <description>&lt;P&gt;In my mind the two biggest considerations for ORC over Parquet are:&lt;/P&gt;&lt;P&gt;1. Many of the performance improvements provided in the Stinger initiative are dependent on features of the ORC format including block level index for each column. This leads to potentially more efficient I/O allowing Hive to skip reading entire blocks of data if it determines predicate values are not present there. Also the Cost Based Optimizer has the ability to consider column level metadata present in ORC files in order to generate the most efficient graph.&lt;/P&gt;&lt;P&gt;2. ACID transactions are only possible when using ORC as the file format.   &lt;/P&gt;</description>
      <pubDate>Mon, 26 Oct 2015 23:42:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95944#M9366</guid>
      <dc:creator>rtempleton</dc:creator>
      <dc:date>2015-10-26T23:42:54Z</dc:date>
    </item>
    <item>
      <title>Re: ORC vs Parquet - When to use one over the other</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95945#M9367</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/200/awatson.html" nodeid="200"&gt;@Andrew Watson&lt;/A&gt; has this been resolved? Can you accept best answer or provide your own solution?&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 00:35:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/ORC-vs-Parquet-When-to-use-one-over-the-other/m-p/95945#M9367</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-03T00:35:03Z</dc:date>
    </item>
  </channel>
</rss>

