<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive Bucket clarification in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118934#M26373</link>
    <description>&lt;P&gt;Let us say i have 4 countries.each country has 10 states.Totally 32 buckets will be created and 32 files will be created on HDFS.But i have confusion with partition how many folders will be created where 32 files will be created .is it within each partition?&lt;/P&gt;</description>
    <pubDate>Wed, 27 Apr 2016 21:11:47 GMT</pubDate>
    <dc:creator>vamsi123</dc:creator>
    <dc:date>2016-04-27T21:11:47Z</dc:date>
    <item>
      <title>Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118932#M26371</link>
      <description>&lt;P&gt;CREATETABLEbucketed_user(
firstnameVARCHAR(64),
lastnameVARCHAR(64),
addressSTRING,
cityVARCHAR(64),
stateVARCHAR(64),
postSTRING,
phone1VARCHAR(64),
phone2STRING,
emailSTRING,
webSTRING
)
COMMENT'A bucketed sorted user table'
PARTITIONEDBY(countryVARCHAR(64))
&lt;STRONG&gt;CLUSTEREDBY(state)SORTEDBY(city)&lt;/STRONG&gt;INTO 32BUCKETS
STORED ASSEQUENCEFILE;&lt;/P&gt;&lt;P&gt;could anybody tell what is the purpouse of &lt;STRONG&gt;CLUSTEREDBY(state)SORTEDBY(city) in bucket table creation?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 19:01:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118932#M26371</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-04-27T19:01:06Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118933#M26372</link>
      <description>,&lt;P&gt;The CLUSTERED BY clause is used to divide the table into buckets. Rows with the same bucketed column will always be stored in the same bucket. In this case, even though there are 50 possible states, the rows in this table will be clustered into 32 buckets. The SORTED BY clause keeps the rows in each bucket ordered by one or more columns. This does not enforce global ordering across buckets for the whole table, only local ordering within each bucket, as each bucket is physically a separate file in the table directory. &lt;/P&gt;&lt;P&gt;If the tables being joined are bucketized on the join columns, and the number of buckets in one table is a multiple of the number of buckets in the other table, the buckets can be joined on the map-side. Map-side joins on bucketed tables will be faster - as the mapper processing a bucket of the left table knows that the matching rows in the right table will be in their corresponding bucket, so it doesn't need to scan the whole table. Map-side joins on tables with sorted by buckets are even more efficient, as the join boils down to just merging the already sorted buckets.&lt;/P&gt;&lt;P&gt;A more detailed explanation, along with MapJoin restrictions, can be found here: &lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 19:58:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118933#M26372</guid>
      <dc:creator>tmccuch</dc:creator>
      <dc:date>2016-04-27T19:58:01Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118934#M26373</link>
      <description>&lt;P&gt;Let us say i have 4 countries.each country has 10 states.Totally 32 buckets will be created and 32 files will be created on HDFS.But i have confusion with partition how many folders will be created where 32 files will be created .is it within each partition?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 21:11:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118934#M26373</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-04-27T21:11:47Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118935#M26374</link>
      <description>&lt;P&gt;In Hive, each partition is physically a separate subdirectory under the table directory. Buckets would then be physically represented as separate files within those subdirectories. Using your example above where you have 4 countries and 32 buckets, this would result in 4 subdirectories under the table directory, each containing 32 files.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Apr 2016 22:03:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118935#M26374</guid>
      <dc:creator>tmccuch</dc:creator>
      <dc:date>2016-04-27T22:03:23Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118936#M26375</link>
      <description>&lt;P&gt;I tried example in the link:http://hadooptutorial.info/bucketing-in-hive/&lt;/P&gt;&lt;P&gt;I created four folders&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3DAU&amp;amp;namenodeInfoPort=50070"&gt;country=AU&lt;/A&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;dir&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;2016-04-28 00:03&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;rwxr-xr-x&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;naresh&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;supergroup&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3DCA&amp;amp;namenodeInfoPort=50070"&gt;country=CA&lt;/A&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;dir&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;2016-04-28 00:03&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;rwxr-xr-x&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;naresh&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;supergroup&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3DUK&amp;amp;namenodeInfoPort=50070"&gt;country=UK&lt;/A&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;dir&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;2016-04-28 00:03&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;rwxr-xr-x&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;naresh&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;supergroup&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3DUS&amp;amp;namenodeInfoPort=50070"&gt;country=US&lt;/A&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;dir&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;2016-04-28 00:03&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;rwxr-xr-x&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;naresh&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;supergroup&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3Dcountry&amp;amp;namenodeInfoPort=50070"&gt;country=country&lt;/A&gt;&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;dir&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;2016-04-28 00:03&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;rwxr-xr-x&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;naresh&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;supergroup&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;each folder contains 32 files.&lt;/P&gt;&lt;P&gt;clarifications:&lt;/P&gt;&lt;P&gt;1)How to select Bucket1 files in folder country=AU?&lt;/P&gt;&lt;P&gt;2)How to select Bucket1 files in folder &lt;STRONG&gt;&lt;A href="http://localhost:50075/browseDirectory.jsp?dir=%2Fuser%2Fhive%2Fwarehouse%2Fbucketed_user%2Fcountry%3Dcountry&amp;amp;namenodeInfoPort=50070"&gt;country=country&lt;/A&gt;? and also why this folder is created it is partitioned by country so four folders should be created and what is this fifth folder?&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 11:21:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118936#M26375</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-04-28T11:21:54Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118937#M26376</link>
      <description>&lt;P&gt;hi &lt;A rel="user" href="https://community.cloudera.com/users/454/tmccuch.html" nodeid="454"&gt;@Tom McCuch&lt;/A&gt; and &lt;A rel="user" href="https://community.cloudera.com/users/9789/vamsivalivetiedu.html" nodeid="9789"&gt;@vamsi valiveti&lt;/A&gt;.  Just wanted to clarify - it is legal to have two bucketed tables where the number of buckets in one table is a multiple of the number of buckets in the other table, but for pragmatic performance reasons it is best to have the number of buckets be the same.  IMHO If you are going to bucket your data, you are doing it because you need a more efficient join - and having a non-matching number of buckets removes that ability to do a sort-merge bucket join.   &lt;/P&gt;&lt;P&gt;See this post on bucket join versus sort-merge bucket join.  it's very good. &lt;A href="http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables"&gt; http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables&lt;/A&gt;   &lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2016 20:53:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118937#M26376</guid>
      <dc:creator>bpreachuk</dc:creator>
      <dc:date>2016-04-28T20:53:11Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118938#M26377</link>
      <description>&lt;P&gt;1)How to select Bucket1 files in folder country=AU?could anybody provide sql query for that?&lt;/P&gt;</description>
      <pubDate>Tue, 10 May 2016 16:19:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118938#M26377</guid>
      <dc:creator>vamsi123</dc:creator>
      <dc:date>2016-05-10T16:19:30Z</dc:date>
    </item>
    <item>
      <title>Re: Hive Bucket clarification</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118939#M26378</link>
      <description>&lt;P&gt;You can do a table sample. &lt;/P&gt;&lt;P&gt;Select * from bucketed_user tablesample(bucket 1 out of 2 on state) where country = AU;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2016 05:14:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-Bucket-clarification/m-p/118939#M26378</guid>
      <dc:creator>efusaro</dc:creator>
      <dc:date>2016-07-06T05:14:27Z</dc:date>
    </item>
  </channel>
</rss>

