<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Hive partitions based on date from timestamp in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61423#M70531</link>
    <description>&lt;P&gt;I have a raw data where i have a column for timestamp.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now i want to move the file to hive and create partions based on date. Daily partitions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From timestamp data how we will create date partions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 12:28:20 GMT</pubDate>
    <dc:creator>Former Member</dc:creator>
    <dc:date>2022-09-16T12:28:20Z</dc:date>
    <item>
      <title>Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61423#M70531</link>
      <description>&lt;P&gt;I have a raw data where i have a column for timestamp.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now i want to move the file to hive and create partions based on date. Daily partitions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From timestamp data how we will create date partions?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:28:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61423#M70531</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2022-09-16T12:28:20Z</dc:date>
    </item>
    <item>
      <title>Re: Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61438#M70532</link>
      <description>You can create hive external table to link to the data in HDFS, and then write data into another table which will be partitioned by date.&lt;BR /&gt;&lt;BR /&gt;Use date functions in Hive to convert timestamp to the value you want:&lt;BR /&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;INSERT OVERWRITE TABLE partitioned_table PARTITION (date_column) SELECT ...., to_date(timestamp_column) as date_column FROM source_table;&lt;BR /&gt;&lt;BR /&gt;Something like this.</description>
      <pubDate>Thu, 02 Nov 2017 03:09:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61438#M70532</guid>
      <dc:creator>EricL</dc:creator>
      <dc:date>2017-11-02T03:09:07Z</dc:date>
    </item>
    <item>
      <title>Re: Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61479#M70533</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have a timestamp in a table and we want o take date out of it then we can write a select statement to_date(timestamp column) from table name.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The question is while creating partition i dont have seperate date column in the table i have only timestamp column.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For static Partitions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;create external table hju(starttime string,h string,m string,mv double,country string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/srf/ji/cana/'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;create table hjuk(h string,m string,mv double,country string)partitioned by (starttime string) location '/hji/lo/p/'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT into table tblename PARTITION(starttime='2017-08-09') SELECT h,m,mv,country from tblname where to_date(starttime)='2017-08-09'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For dynamic partitions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;create table tblename parg(h string,m string,mv double,country string)partitioned by (starttime string) location '/hiloi/kil'&lt;/P&gt;&lt;P&gt;INSERT overwrite table tblename PARTITION(starttime) SELECT h,m,mv,country ,starttime from tblename&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INFO : Starting task [Stage-2:STATS] in serial mode&lt;BR /&gt;INFO : Partition naparg1{starttime=2017-08-01&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=26, rawDataSize=25]&lt;BR /&gt;INFO : Partition arg1{starttime=2017-08-02&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24]&lt;BR /&gt;INFO : Partition aparg1{starttime=2017-08-03&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=24, rawDataSize=23]&lt;BR /&gt;INFO : Partition arg1{starttime=2017-08-04&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=24, rawDataSize=23]&lt;BR /&gt;INFO : Partition arg1{starttime=2017-08-05&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=24, rawDataSize=23]&lt;BR /&gt;INFO : Partition arg1{starttime=2017-08-06&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=1, totalSize=25, rawDataSize=24]&lt;BR /&gt;INFO : Partition arg1{starttime=2017-08-09&amp;nbsp;18:40:03.0} sta ts: [numFiles=1, numRows=7, totalSize=172, rawDataSize=165]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 03:36:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61479#M70533</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2017-11-03T03:36:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hive partitions based on date from timestamp</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61499#M70534</link>
      <description>&lt;P&gt;The timestamp column is not "suitable" for a partition (unless you want thousands and thousand of partitions).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What is suitable :&lt;/P&gt;&lt;P&gt;- is to create an Hive table on top of the current not partitionned data,&lt;/P&gt;&lt;P&gt;- create a second Hive table for hosting the partitionned data (the same columns + the partition column),&lt;/P&gt;&lt;P&gt;- eventualy load the data from the first table to the second one using a query that will "&lt;U&gt;&lt;STRONG&gt;parse&lt;/STRONG&gt;&lt;/U&gt;" the timestamp column and extract what should be a suitable value for the partition column (for example the year or the year-and-the-month, ...).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Example :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;INSERT INTO TABLE my_partitioned_table PARTITION (part_col_name) SELECT *, &lt;U&gt;&lt;STRONG&gt;year(to_date(my_timestamp_column))&lt;/STRONG&gt;&lt;/U&gt; FROM my_not_partitioned_table;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You don't have to put the partition value in the insert statement if you enable dynamic partition in Hive.&lt;/P&gt;&lt;PRE&gt;set hive.exec.dynamic.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict;  &lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And on your sample it's not working properly because you didn't parse the timestamp column, you use it as is.&lt;/P&gt;&lt;P&gt;Each unique value will create a partition. For a timestamps, it's almost each value that is unique.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 14:02:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-partitions-based-on-date-from-timestamp/m-p/61499#M70534</guid>
      <dc:creator>mathieu.d</dc:creator>
      <dc:date>2017-11-03T14:02:28Z</dc:date>
    </item>
  </channel>
</rss>

