<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: kudu is slower than parquet? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/kudu-is-slower-than-parquet/m-p/56504#M14061</link>
    <description>&lt;P&gt;We'd expect Kudu to be slower than Parquet on a pure read benchmark, but not 10x slower - that may be a configuration problem. We've published results on the Cloudera blog before that demonstrate this:&amp;nbsp;&lt;A href="http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/&amp;nbsp;" target="_blank"&gt;http://blog.cloudera.com/blog/2017/02/performance-comparing-of-different-file-formats-and-storage-engines-in-hadoop-file-system/&amp;nbsp;&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Parquet is a read-only storage format while Kudu supports row-level updates so&amp;nbsp;they make different trade-offs. I think we have headroom to significantly improve the performance of both table formats in Impala over time.&lt;/P&gt;&lt;P&gt;E.g. in Impala 2.9/CDH5.12&amp;nbsp;&lt;A href="http://issues.cloudera.org/browse/IMPALA-5347" target="_blank"&gt;IMPALA-5347&lt;/A&gt;&amp;nbsp;and&amp;nbsp;IMPALA-5304 improve pure Parquet scan performance by 50%+ on some workloads, and I think there are probably similar opportunities for Kudu.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 26 Jun 2017 15:41:17 GMT</pubDate>
    <dc:creator>Tim Armstrong</dc:creator>
    <dc:date>2017-06-26T15:41:17Z</dc:date>
  </channel>
</rss>

