<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Clusters/Buckets in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Clusters-Buckets/m-p/148610#M28366</link>
    <description>&lt;P&gt;"My understanding so far is that partitioning a table optimises the performance of queries such that rather than performing the query on the entire table it performs the query only on the partition of interest e.g. find employee details where state = NYC. It will just query the NYC partition and return the employee details, correct? These partitions are stored in separate directories/files in HDFS."&lt;/P&gt;&lt;P&gt;Correct&lt;/P&gt;&lt;P&gt;"What is a bucket and why would one use them rather than partitions? I take it a bucket and cluster are the same beast just that you use "clusteredby" to create the buckets?"&lt;/P&gt;&lt;P&gt;You are correct and buckets are essentially files in these partition folders. Every bucket = one file. You can find the reasoning and the uses for them here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/23103/hive-deciding-the-number-of-buckets.html" target="_blank"&gt;https://community.hortonworks.com/questions/23103/hive-deciding-the-number-of-buckets.html&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 15 May 2016 19:34:05 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-05-15T19:34:05Z</dc:date>
    <item>
      <title>Hive Clusters/Buckets</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Clusters-Buckets/m-p/148609#M28365</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am having difficulty understanding the concept of buckets/clusters in Hive. &lt;/P&gt;&lt;P&gt;My understanding so far is that partitioning a table optimises the performance of queries such that rather than performing the query on the entire table it performs the query only on the partition of interest e.g. find employee details where state = NYC. It will just query the NYC partition and return the employee details, correct?  These partitions are stored in separate directories/files in HDFS. &lt;/P&gt;&lt;P&gt;What is a bucket and why would one use them rather than partitions? I take it a bucket and cluster are the same beast just that you use "clusteredby" to create the buckets?  &lt;/P&gt;</description>
      <pubDate>Sun, 15 May 2016 18:25:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Clusters-Buckets/m-p/148609#M28365</guid>
      <dc:creator>jgarrigan</dc:creator>
      <dc:date>2016-05-15T18:25:21Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Clusters/Buckets</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Clusters-Buckets/m-p/148610#M28366</link>
      <description>&lt;P&gt;"My understanding so far is that partitioning a table optimises the performance of queries such that rather than performing the query on the entire table it performs the query only on the partition of interest e.g. find employee details where state = NYC. It will just query the NYC partition and return the employee details, correct? These partitions are stored in separate directories/files in HDFS."&lt;/P&gt;&lt;P&gt;Correct&lt;/P&gt;&lt;P&gt;"What is a bucket and why would one use them rather than partitions? I take it a bucket and cluster are the same beast just that you use "clusteredby" to create the buckets?"&lt;/P&gt;&lt;P&gt;You are correct and buckets are essentially files in these partition folders. Every bucket = one file. You can find the reasoning and the uses for them here:&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/questions/23103/hive-deciding-the-number-of-buckets.html" target="_blank"&gt;https://community.hortonworks.com/questions/23103/hive-deciding-the-number-of-buckets.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 15 May 2016 19:34:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Clusters-Buckets/m-p/148610#M28366</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-05-15T19:34:05Z</dc:date>
    </item>
  </channel>
</rss>

