Support Questions

Find answers, ask questions, and share your expertise

partitioned by question in hive

Master Collaborator

i have a table with date timestamps , can I partition my table by either day or month ? if yes how will I do it ..

create table ( ...) PARTITIONED BY (INSERT_TIME .... something here? )

SQL> select insert_time from test where rownum < 10;

INSERT_TIME
---------------------------------------------------------------------------
07-MAR-18 07.01.02.000000 PM
07-MAR-18 07.02.02.000000 PM
07-MAR-18 07.02.02.000000 PM
07-MAR-18 07.04.03.000000 PM
07-MAR-18 07.04.03.000000 PM
07-MAR-18 07.04.03.000000 PM
07-MAR-18 07.04.05.000000 PM
07-MAR-18 07.04.07.000000 PM
07-MAR-18 07.04.07.000000 PM

3 REPLIES 3

Super Guru
@Sami Ahmad

To convert the insert_time into hive timestamp format use the below statement

hive> select from_unixtime(unix_timestamp('07-MAR-18 07.01.02.000000 PM' ,'dd-MMM-yy HH.mm.ss.SSSSSS a'), 'yyyy-MM-dd HH:mm:ss.SSSSSSS');
+------------------------------+--+
|             _c0              |
+------------------------------+--+
| 2018-03-07 07:01:02.0000000  |
+------------------------------+--+

Depends on your use case/number of records in each partition you can decide on which basis you need to do partitioning(either by month (or) day), Create partition filed with string type.

hive> create table <db_name>.<tab_name> (<col_name> <data_type>) PARTITIONED BY (<month(or)day> string) stored as orc;

Refer to this link to get more details regards to creating hive partitioned tables.

While inserting data from non partitioned table to partition table hive uses last field as the partition field from your select statement so use the partition column as the last column in your select statement.

hive> insert into <db>.<partitioned_table> partition(<partition_field_name>)
select
col1,
col2,
...,
<partition_column> as <partition_field_name>
from
<db>.<non_partitioned_table>

Refer to this link for more details regards how to insert data into partitioned table.

Master Collaborator

if I partition it by month it will automatically put the timestamps belonging to a particular month in one partition ? even though my data not just dd-mon-yyyy but dd-mon-yyy hh24:mi:ss

Super Guru
@Sami Ahmad

Yes, as we are extracting partition column values based on `dd-mon-yyy hh24:mi:ss` column so we are going to have all timestamps will be in that partition it self.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.