Reply
Highlighted
Contributor
Posts: 25
Registered: ‎02-11-2019
Accepted Solution

Best Data Type for Hive Date Partition

we need to partition our Hive Table based on date. Date/Month/Year

 

is it better to use int or string for the partition types.

ex:

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year INT, month INT, day INT)

Stored as Parquet

vs

 

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year string, month string, day string)

Stored as Parquet

 

Noticed that we couldn't do queries like:

... where day > 10 with the string option

Cloudera Employee
Posts: 761
Registered: ‎03-23-2015

Re: Best Data Type for Hive Date Partition

Hi,

I would suggest to use INT rather than STRING.

Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.