<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: kudu is slower than parquet? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/kudu-is-slower-than-parquet/m-p/56614#M14066</link>
    <description>&lt;P&gt;&lt;SPAN&gt;1, Make sure you run COMPUTE STATS: yes, we do this after loading data&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2,&amp;nbsp;What is the total size of your data set?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;impala tpc-ds tool create 9 dim tables and 1 fact table, &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;which dim tables are small(record num from 1k to 4million+ according to the datasize generated&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;and the fact table is big,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;here is the 'data siez--&amp;gt;record num' of fact table:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;512g&amp;lt;--&amp;gt;4224587147&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;256g&amp;lt;--&amp;gt;2112281549&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;64g&amp;lt;--&amp;gt;528071062&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;3,&amp;nbsp;Can you also share how you partitioned your Kudu table?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;for the dim tables, we hash partition it into 2 partitions by their primary (no partition for parquet table),&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;for the fact table, we range partition it into 60 partitions by its 'data field'(parquet partition into 1800+ partitions),&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;for those tables create in kudu, their replication factor is 3.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Jun 2017 04:05:50 GMT</pubDate>
    <dc:creator>lewiss</dc:creator>
    <dc:date>2017-06-28T04:05:50Z</dc:date>
  </channel>
</rss>

