- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Best Data Type for Hive Date Partition
- Labels:
-
Apache Hive
Created on ‎04-07-2019 04:41 AM - edited ‎09-16-2022 07:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we need to partition our Hive Table based on date. Date/Month/Year
is it better to use int or string for the partition types.
ex:
CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year INT, month INT, day INT)
Stored as Parquet
vs
CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year string, month string, day string)
Stored as Parquet
Noticed that we couldn't do queries like:
... where day > 10 with the string option
Created ‎04-12-2019 07:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest to use INT rather than STRING.
Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.
Created ‎04-12-2019 07:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest to use INT rather than STRING.
Firstly, searching based INT type is faster, and secondly, like you said, you can do numeric comparison, which will be different from the STRING type comparison.
