Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best Data Type for Hive Date Partition

avatar
Contributor

we need to partition our Hive Table based on date. Date/Month/Year

 

is it better to use int or string for the partition types.

ex:

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year INT, month INT, day INT)

Stored as Parquet

vs

 

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year string, month string, day string)

Stored as Parquet

 

Noticed that we couldn't do queries like:

... where day > 10 with the string option

1 ACCEPTED SOLUTION

avatar
Super Guru
Hi,

I would suggest to use INT rather than STRING.

Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.

View solution in original post

1 REPLY 1

avatar
Super Guru
Hi,

I would suggest to use INT rather than STRING.

Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.