Support Questions

Find answers, ask questions, and share your expertise

Hive Vs. Impala Queries

avatar
New Contributor

Hey,

 

I am running into an issue where the same query is giving me different results when ran on hive vs. impala. I can't figure out what the the problem could be that results in the different results.

 

Here's the query:

 

SELECT
TO_DATE(dtFirstActive) as valid_date,
vchpromocode as publisher,
SUM(CASE WHEN TO_DATE(dtbillingdate) > TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as trials,
SUM(CASE WHEN TO_DATE(dtbillingdate) <= TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as prepaids,
SUM(CASE WHEN TO_DATE(dtcancel) = TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as same_day_cancels,
FROM sql.Account
WHERE vchPromocode = 'GFAPPIOS01' or vchPromocode = 'GFAPPIOS02' or vchPromocode = 'GFAPPAND01' or vchPromocode = 'GFAPPAND02'
GROUP BY TO_DATE(dtFirstActive), vchpromocode

 

Any help be would be great!

 

Sam

1 ACCEPTED SOLUTION

avatar
Expert Contributor

I solved the issue with from_utc_timestamp(Create_Time, 'CEST').

Impala assumes the the timestamp value is stored in UCT. 

So converting to central european time with summery daylight saving will produce the correct result.

As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.

 

 

 

View solution in original post

6 REPLIES 6

avatar
Contributor

Any chance you could include the DDL and some sample data to allow us to reproduce it?

What versions of Impala & Hive are you running?

avatar
New Contributor

Thanks for the reply. I was able to figure out the issue.

 

Thank you though!

avatar
Mentor

In spirit of http://xkcd.com/979/ please do post what your specific solution was, so others having the same issue and stumbling here would find it useful!

P.s. You can mark the topic as solved as well.

avatar
Expert Contributor

I have a same issue, the same query returns different dates. In impala the date is one hour less than in Hive.

Table was created in hive, loaded with data via insert overwrite table in hive (table is partitioned).

 

And for example the timestamp 2014-11-18 00:30:00 - 18th of november was correctly written to partition 20141118.

But when I fetch the table in impala, whith condition day_id (partition column) = 20141118 I see a value 2014-11-17 23:30:00 

 

So the difference is one hour.

 

If I query the minimum and maximum start_time from the table in one partition in the Imapal (partition day_id = 2014118) I get this wrong result:

min( start_time ) = 2014-11-17 23:00 

max( start_time ) = 2014-11-18 22:59 

 

when I run the same query in Hive the result is ok:

min( start_time ) = 2014-11-18 00:00

max( start_time ) = 2014-11-18 23:59

 

Any help?

 

avatar
Expert Contributor

I solved the issue with from_utc_timestamp(Create_Time, 'CEST').

Impala assumes the the timestamp value is stored in UCT. 

So converting to central european time with summery daylight saving will produce the correct result.

As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.

 

 

 

avatar
Expert Contributor

More interestingly this differencce dissappeared after upgrading to CDH 5.3.1.

T.