Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best Data Type for Hive Date Partition

Solved Go to solution
Highlighted

Best Data Type for Hive Date Partition

Explorer

we need to partition our Hive Table based on date. Date/Month/Year

 

is it better to use int or string for the partition types.

ex:

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year INT, month INT, day INT)

Stored as Parquet

vs

 

CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year string, month string, day string)

Stored as Parquet

 

Noticed that we couldn't do queries like:

... where day > 10 with the string option

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Best Data Type for Hive Date Partition

Guru
Hi,

I would suggest to use INT rather than STRING.

Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.
1 REPLY 1

Re: Best Data Type for Hive Date Partition

Guru
Hi,

I would suggest to use INT rather than STRING.

Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.
Don't have an account?
Coming from Hortonworks? Activate your account here