<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: query on partition question in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55737#M62718</link>
    <description>&lt;P&gt;I was asking just in general if there is any difference, and which one you would recommand.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For me, one query wouold be aggregate by year/month/day.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Shannon&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 12 Jun 2017 18:58:48 GMT</pubDate>
    <dc:creator>ponypony</dc:creator>
    <dc:date>2017-06-12T18:58:48Z</dc:date>
    <item>
      <title>query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55725#M62716</link>
      <description>&lt;P&gt;I partition with year/month/day, is therre a difference if i query/aggregate using that or the timestamp (used for partition) column?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Shannon&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 11:44:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55725#M62716</guid>
      <dc:creator>ponypony</dc:creator>
      <dc:date>2022-09-16T11:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55732#M62717</link>
      <description>&lt;P&gt;I'm inclined to say that yes there will be a difference. One or two example queries to show the alternatives would be helpful for me to give you a more accurate response.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jun 2017 17:49:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55732#M62717</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2017-06-12T17:49:51Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55737#M62718</link>
      <description>&lt;P&gt;I was asking just in general if there is any difference, and which one you would recommand.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For me, one query wouold be aggregate by year/month/day.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Shannon&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jun 2017 18:58:48 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55737#M62718</guid>
      <dc:creator>ponypony</dc:creator>
      <dc:date>2017-06-12T18:58:48Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55816#M62719</link>
      <description>TL:DR The results will be identical if used in the same manner but the runtime and resource requirements will be different.&lt;BR /&gt;&lt;BR /&gt;If I understand the question correctly you are asking this:&lt;BR /&gt;&lt;BR /&gt;If there is a timestamp column that you use to create the partition columns, is there a difference in querying on each.&lt;BR /&gt;&lt;BR /&gt;This goes back to partition columns being virtual columns. If you set a partitions column based on an actual column and just change the name, then the physical column (timestamp) remains and the virtual columns (YMD) exist in the form of the directory structure in HDFS. When you query on the partitions columns it will perform partition pruning, on the other side it will no. But in effect the results will be the same for the aggregation. This is the same if you partition by subsets, i.e. year/month/day.</description>
      <pubDate>Tue, 13 Jun 2017 20:03:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55816#M62719</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-06-13T20:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55822#M62720</link>
      <description>&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry i was not clear when i said diffference, i meant is there any performance difference?&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jun 2017 21:39:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55822#M62720</guid>
      <dc:creator>ponypony</dc:creator>
      <dc:date>2017-06-13T21:39:34Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55895#M62721</link>
      <description>&lt;P&gt;Yes, very likely there will be a performance difference, but it's hard to say which one will be better without concrete examples.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2017 01:04:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/55895#M62721</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2017-06-15T01:04:28Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56126#M62722</link>
      <description>&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a related question, how does hdfs/impala know that one of the fields/columns is used as the partition?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Shannon&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jun 2017 14:52:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56126#M62722</guid>
      <dc:creator>ponypony</dc:creator>
      <dc:date>2017-06-19T14:52:27Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56133#M62723</link>
      <description>The table definition defines the virtual/partition column and in HDFS it is created as directories and subdirectories. So it checks the table definition and then searches for a directory under the table directory that matches the partition column name, and then prunes by the value.</description>
      <pubDate>Mon, 19 Jun 2017 16:20:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56133#M62723</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-06-19T16:20:54Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56134#M62724</link>
      <description>&lt;P&gt;Hdfs does not know about partitions. That information is stored in the Hive Metastore as part of the other table metadata.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;A partition of a Impala/Hive table points to a directory in Hdfs. The values of partition&amp;nbsp;columns are not stored in data files, they are "stored" in the Hdfs directory structure, e.g.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hdfs://warehouse/mytable/year=2017/month=6&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;might be a directory of a partitioned table "mytable" with partition columns year and month.&lt;/P&gt;</description>
      <pubDate>Mon, 19 Jun 2017 16:22:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56134#M62724</guid>
      <dc:creator>alex.behm</dc:creator>
      <dc:date>2017-06-19T16:22:16Z</dc:date>
    </item>
    <item>
      <title>Re: query on partition question</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56166#M62725</link>
      <description>&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2017 14:24:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/query-on-partition-question/m-p/56166#M62725</guid>
      <dc:creator>ponypony</dc:creator>
      <dc:date>2017-06-20T14:24:34Z</dc:date>
    </item>
  </channel>
</rss>

