Created on 08-07-2013 04:16 PM - edited 09-16-2022 01:46 AM
Hey,
I am running into an issue where the same query is giving me different results when ran on hive vs. impala. I can't figure out what the the problem could be that results in the different results.
Here's the query:
SELECT
TO_DATE(dtFirstActive) as valid_date,
vchpromocode as publisher,
SUM(CASE WHEN TO_DATE(dtbillingdate) > TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as trials,
SUM(CASE WHEN TO_DATE(dtbillingdate) <= TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as prepaids,
SUM(CASE WHEN TO_DATE(dtcancel) = TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as same_day_cancels,
FROM sql.Account
WHERE vchPromocode = 'GFAPPIOS01' or vchPromocode = 'GFAPPIOS02' or vchPromocode = 'GFAPPAND01' or vchPromocode = 'GFAPPAND02'
GROUP BY TO_DATE(dtFirstActive), vchpromocode
Any help be would be great!
Sam
Created 12-08-2014 01:39 PM
I solved the issue with from_utc_timestamp(Create_Time, 'CEST').
Impala assumes the the timestamp value is stored in UCT.
So converting to central european time with summery daylight saving will produce the correct result.
As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.
Created 08-08-2013 11:05 AM
Any chance you could include the DDL and some sample data to allow us to reproduce it?
What versions of Impala & Hive are you running?
Created 08-15-2013 10:12 AM
Thanks for the reply. I was able to figure out the issue.
Thank you though!
In spirit of http://xkcd.com/979/ please do post what your specific solution was, so others having the same issue and stumbling here would find it useful!
P.s. You can mark the topic as solved as well.
Created 11-26-2014 02:17 PM
I have a same issue, the same query returns different dates. In impala the date is one hour less than in Hive.
Table was created in hive, loaded with data via insert overwrite table in hive (table is partitioned).
And for example the timestamp 2014-11-18 00:30:00 - 18th of november was correctly written to partition 20141118.
But when I fetch the table in impala, whith condition day_id (partition column) = 20141118 I see a value 2014-11-17 23:30:00
So the difference is one hour.
If I query the minimum and maximum start_time from the table in one partition in the Imapal (partition day_id = 2014118) I get this wrong result:
min( start_time ) = 2014-11-17 23:00
max( start_time ) = 2014-11-18 22:59
when I run the same query in Hive the result is ok:
min( start_time ) = 2014-11-18 00:00
max( start_time ) = 2014-11-18 23:59
Any help?
Created 12-08-2014 01:39 PM
I solved the issue with from_utc_timestamp(Create_Time, 'CEST').
Impala assumes the the timestamp value is stored in UCT.
So converting to central european time with summery daylight saving will produce the correct result.
As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.
Created 01-07-2015 01:12 AM
More interestingly this differencce dissappeared after upgrading to CDH 5.3.1.
T.