11-09-2017 03:39 AM - last edited on 11-09-2017 06:18 AM by cjervis
Came across a strange issue where in when converting a timestamp from PST to GMT, to_utc_timestamp function is returning blank. I could see that this happening for an hour range only on the day daylight savings were switched off in US.
Query: select to_utc_timestamp(cast('2017-11-05 01:00:00' as timestamp),'PST') Query: select to_utc_timestamp(cast('2017-11-05 01:59:59' as timestamp),'PST')
Above queries return blank output.
However the same things works fine in hive:
Query: select to_utc_timestamp(cast('2017-11-05 01:00:00' as timestamp),'PST') OK 2017-11-05 09:00:00 Query: select to_utc_timestamp(cast('2017-11-05 01:59:59' as timestamp),'PST') OK 2017-11-05 09:59:59
Need help understanding reason for the same and also how to work around this issue using Impala query itself.
Impala Version - v2.7.0 on CDH 5.10
Hive version - 1.1.0 on CDH 5.10
11-17-2017 01:04 PM
My understanding is that this is a tricky case that different systems handle differently because it's ambiguous - in PST there are actually two points in time when it was 2017-11-05 1:00:00, so either the system needs to arbitrarily decide which one it was, possibly returning an incorrect result, or it needs to return NULL or some error.
Impala made the second decision but it seems like the Java libraries used by Hive made the first decision. I'm checking with a colleague to see if we have it documented anywhere.
11-17-2017 02:50 PM
In Impala we try to have the following policy when converting timestamps to different timezones: In case of ambiguity, return NULL. Ambiguity means that we could return more than one answer. When daylight savings time switch happens in the fall, the same hour happens twice (between 1-2 am), so if we see "1:30 am", we can't tell if that point in time is before the swtich, or after. Interestingly, when the daylight savings switch happens in the spring, the hour between 2am and 3am does not exist at all.
11-22-2017 10:12 AM
I don't think there's a plan to make a spot-fix for this issue. We're looking at a general revamp of our timezone conversion to be faster and more robust - see https://issues.apache.org/jira/browse/IMPALA-3307 . I think as part of that switch we'll get more flexibility to handle edge cases like this so we could re-evaluate the behaviour then.