Trying to store timestamps like **2020-01-03T02:46:21.148+02:00** in an **ORC** hive table.
Storing the timestamps using the timestamp datatype gives NULL on querying which is expected as this is not the format hive expects it's timestamps to be.
But as per the documentation if we set the appropriate timestamp format serde property then hive should be able to read the timezones.
*"On the table level, alternative timestamp formats can be supported by providing the format to the SerDe property "timestamp.formats" (as of release 1.2.0 with HIVE-9298). For example, yyyy-MM-dd'T'HH:mm:ss.SSS,yyyy-MM-dd'T'HH:mm:ss."*
However despite **ALTER TABLE <tablename> SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss.SSSXXXXX");** the property seems to have no effect and hive is still not able to read it. I tried multiple variations of the format but none worked.
Creating the table as textfile/CSV doesn't help either
To replicate -
create table default.test_tz (tt timestamp) stored as orc;
insert into default.test_tz values ("2020-01-03T02:46:21.148+02:00"), ("2017-02-16T11:24:29.000Z"), ("2017-02-16 11:24:29"), ("2019-06-15T15:43:19");
ALTER TABLE default.test_tz SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss.SSSXXXXX");
**Should I just store it as string? What is the best practice in such cases.**
According to this previous accepted solution:
ALTER TABLE table SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss"); Works only in case of Textformat,CSV format tables. If you are having other format table like orc..etc then set serde properties are not got to be working.
I am going to test it here myself shortly. I have a use case where we are inserting ISO-8601 into Elastic where data ultimately also goes to exernal csv table (wi thout the timestamp). I should be able to quickly validate what is required for a working example. I suspect you are having issues with the after ss parts matching the serde syntax.