<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive partitions based on date from timestamp in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/179583#M70551</link>
    <description />
    <pubDate>Wed, 01 Nov 2017 15:51:24 GMT</pubDate>
    <dc:creator>Gayathridevi</dc:creator>
    <dc:date>2017-11-01T15:51:24Z</dc:date>
    <item>
      <title>Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/179583#M70551</link>
      <description />
      <pubDate>Wed, 01 Nov 2017 15:51:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/179583#M70551</guid>
      <dc:creator>Gayathridevi</dc:creator>
      <dc:date>2017-11-01T15:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/179584#M70552</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/39249/gayathrimtechcse.html" nodeid="39249"&gt;@Gayathri Devi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;First you need to create a hive non partition table on raw data. &lt;/P&gt;&lt;P&gt;Then you need to create partition table in hive  then insert from non partition table to partition table.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;For testing i have tried an example as below:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Right now my hive normal table(i.e not partition table) having these list of records.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Normal table(without partition column):-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hive# create table text_table(id int, dt string,name string) stored as textfile location '/user/yashu/text_table';&lt;/PRE&gt;
&lt;PRE&gt;hive# select * from text_table;
+----------------+----------------------+------------------+--+
| text_table.id  |    text_table.dt     | text_table.name  |
+----------------+----------------------+------------------+--+
| 1              | 2017-10-31 10:12:09  | foo              |
| 1              | 2017-10-31 12:12:09  | bar              |
| 1              | 2017-10-30 12:12:09  | foobar           |
| 1              | 2017-10-30 10:12:09  | barbar           |
+----------------+----------------------+------------------+--+&lt;/PRE&gt;&lt;P&gt;Then i want to do daily partition table for this case i need to create a new table having dt as partition column in it&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Partition table:-&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;There are 2 kinds of partitions in hive&lt;/P&gt;&lt;PRE&gt;1.Static partitions //adding partition statically and loading data into it,takes less time than dynamic partitions as it won't need to look into data while creating partitions.
2.Dynamic partitions //creating partitions dynamically based on the column value, take more time than static partitions if data is huge because it needs to look into data while creating partitions.&lt;/PRE&gt;
&lt;PRE&gt;hive# create table partition_table(
id int,
name string)
partitioned by (dt string);&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;1.Dynamic Partition:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;once you create partition table then select from non partition table, &lt;/P&gt;&lt;PRE&gt;hive# insert into partition_table partition(dt) select id,name, substring(dt,0,10) from text_table; //we need to have daily partition so i'm doing sub string from 0-10 i.e 2017-10-31 so this will create date partitions&lt;/PRE&gt;
&lt;PRE&gt;INFO  :          Time taken to load dynamic partitions: 0.066 seconds
INFO  :         Loading partition {dt=2017-10-30} //creating 2017-10-30 partition
INFO  :         Loading partition {dt=2017-10-31} //creating 2017-10-30 partition
INFO  :          Time taken for adding to write entity : 0
INFO  : Partition default.partition_table{dt=2017-10-30} stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16]
INFO  : Partition default.partition_table{dt=2017-10-31} stats: [numFiles=1, numRows=2, totalSize=12, rawDataSize=10]
No rows affected (10.055 seconds)&lt;/PRE&gt;&lt;P&gt;We are doing dynamic partitions in our above statement i.e we are creating partition based on our data.&lt;/P&gt;&lt;P&gt;if you want to view the partitions then give&lt;/P&gt;&lt;PRE&gt;hive# show partitions partition_table; //we can view all partitions that has create in the table.
+----------------+--+
|   partition    |
+----------------+--+
| dt=2017-10-30  |
| dt=2017-10-31  |
+----------------+--+
2 rows selected (0.064 seconds)&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;Drop partitions:-&lt;/STRONG&gt;&lt;/P&gt;&lt;PRE&gt;hive# alter table partition_table drop partition(dt&amp;gt;'0') purge; //it will drop all the partitions (or) you can drop specific partition by mentioning as dt='2017-10-30'(it will drop only 2017-10-30 partition)
INFO  : Dropped the partition dt=2017-10-30
INFO  : Dropped the partition dt=2017-10-31
No rows affected (0.132 seconds)&lt;/PRE&gt;&lt;P&gt;To &lt;STRONG&gt;view all partition directories&lt;/STRONG&gt; information&lt;/P&gt;&lt;PRE&gt; hadoop fs -ls -R /apps/hive/warehouse/partition_table/
drwxrwxrwx   - hdfs          0 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-30 //partition directory
-rwxrwxrwx   3 hdfs         18 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-30/000000_0 //file in the partition
drwxrwxrwx   - hdfs          0 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-31
-rwxrwxrwx   3 hdfs         12 2017-11-01 21:45 /apps/hive/warehouse/partition_table/dt=2017-10-31/000000_0&lt;/PRE&gt;&lt;P&gt;To view data from one partition&lt;/P&gt;&lt;PRE&gt;select * from partition_table where dt='2017-10-30';
+---------------------+-----------------------+---------------------+--+
| partition_table.id  | partition_table.name  | partition_table.dt  |
+---------------------+-----------------------+---------------------+--+
| 1                   | foobar                | 2017-10-30          |
| 1                   | barbar                | 2017-10-30          |
+---------------------+-----------------------+---------------------+--+&lt;/PRE&gt;&lt;P&gt;As you can see out&lt;STRONG&gt; dt &lt;/STRONG&gt;column in &lt;STRONG&gt;non partitioned&lt;/STRONG&gt; table having &lt;STRONG&gt;2017-10-30 12:12:09&lt;/STRONG&gt; but in &lt;STRONG&gt;partition table&lt;/STRONG&gt; having &lt;STRONG&gt;2017-10-30 &lt;/STRONG&gt;because as we are loading the data to &lt;STRONG&gt;partition table we did sub string&lt;/STRONG&gt; on dt column.&lt;/P&gt;&lt;P&gt;--&amp;gt; if you &lt;STRONG&gt;don't want&lt;/STRONG&gt; to change the&lt;STRONG&gt; source data&lt;/STRONG&gt; i.e &lt;STRONG&gt;dt column from non partition table&lt;/STRONG&gt; to partition table then create partition table with   &lt;/P&gt;&lt;PRE&gt;hive# create table partition_table(
id int,
name string,
dt string)
partitioned by (daily string); //new partition column&lt;/PRE&gt;
&lt;PRE&gt;hive# insert into partition_table partition(daily) select id,name,dt, substring(dt,0,10) from text_table; //we are having daily as partition column and in select statement we have used dt column twice one is to load actual dt column data and second one is to create partition column.&lt;/PRE&gt;
&lt;PRE&gt;show partitions partition_table;
+-------------------+--+
|     partition     |
+-------------------+--+
| daily=2017-10-30  |
| daily=2017-10-31  |
+-------------------+--+
2 rows selected (0.066 seconds)
0: jdbc:hive2://usor7dhc01w01.use.ucdp.net:21&amp;gt; select * from partition_table; //as you can see we haven't changed dt column data as we have new daily column as partition column
+---------------------+-----------------------+----------------------+------------------------+--+
| partition_table.id  | partition_table.name  |  partition_table.dt  | partition_table.daily  |
+---------------------+-----------------------+----------------------+------------------------+--+
| 1                   | foobar                | 2017-10-30 12:12:09  | 2017-10-30             |
| 1                   | barbar                | 2017-10-30 10:12:09  | 2017-10-30             |
| 1                   | foo                   | 2017-10-31 10:12:09  | 2017-10-31             |
| 1                   | bar                   | 2017-10-31 12:12:09  | 2017-10-31             |
+---------------------+-----------------------+----------------------+------------------------+--+&lt;/PRE&gt;&lt;P&gt;**keep in mind &lt;STRONG&gt;partition column&lt;/STRONG&gt; needs to be&lt;STRONG&gt; last column&lt;/STRONG&gt; in your &lt;STRONG&gt;select statement, if not &lt;/STRONG&gt;hive creates partitions based on what ever the&lt;STRONG&gt; last column is in your select statement&lt;/STRONG&gt;.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2.Static partition:-&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;We are statically creating partition and loading all the data into that partition,&lt;/P&gt;&lt;PRE&gt;hive# insert into partition_table partition(dt='2017-10-30') select id,name from text_table; //we are mentioned partition name here as dt='2017-10-30' so all data will be loaded into 2017-10-30 partition&lt;/PRE&gt;&lt;P&gt;if you are doing static partition that means all the dt data should be 2017-10-30 and you can view we haven't mentioned dt column in select statement.&lt;/P&gt;&lt;PRE&gt;hive# show partitions partition_table;
+----------------+--+
|   partition    |
+----------------+--+
| dt=2017-10-30  |
+----------------+--+

hive# select * from partition_table; //all dt will be 2017-10-30 because we are doing static partition column
+---------------------+-----------------------+---------------------+--+
| partition_table.id  | partition_table.name  | partition_table.dt  |
+---------------------+-----------------------+---------------------+--+
| 1                   | foo                   | 2017-10-30          |
| 1                   | bar                   | 2017-10-30          |
| 1                   | foobar                | 2017-10-30          |
| 1                   | barbar                | 2017-10-30          |
+---------------------+-----------------------+---------------------+--+
&lt;/PRE&gt;&lt;P&gt;As you need to decide which kind of partitions are best fit for your case.&lt;/P&gt;&lt;P&gt;Hope this will help you to understand about partitions..!!&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 09:41:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/179584#M70552</guid>
      <dc:creator>Shu_ashu</dc:creator>
      <dc:date>2017-11-02T09:41:56Z</dc:date>
    </item>
  </channel>
</rss>

