Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Vs. Impala Queries

Solved Go to solution

Hive Vs. Impala Queries

New Contributor

Hey,

 

I am running into an issue where the same query is giving me different results when ran on hive vs. impala. I can't figure out what the the problem could be that results in the different results.

 

Here's the query:

 

SELECT
TO_DATE(dtFirstActive) as valid_date,
vchpromocode as publisher,
SUM(CASE WHEN TO_DATE(dtbillingdate) > TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as trials,
SUM(CASE WHEN TO_DATE(dtbillingdate) <= TO_DATE(dtfirstactive) AND TO_DATE(dtcancel) <> TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as prepaids,
SUM(CASE WHEN TO_DATE(dtcancel) = TO_DATE(dtfirstactive) THEN 1 ELSE 0 END) as same_day_cancels,
FROM sql.Account
WHERE vchPromocode = 'GFAPPIOS01' or vchPromocode = 'GFAPPIOS02' or vchPromocode = 'GFAPPAND01' or vchPromocode = 'GFAPPAND02'
GROUP BY TO_DATE(dtFirstActive), vchpromocode

 

Any help be would be great!

 

Sam

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive Vs. Impala Queries

Rising Star

I solved the issue with from_utc_timestamp(Create_Time, 'CEST').

Impala assumes the the timestamp value is stored in UCT. 

So converting to central european time with summery daylight saving will produce the correct result.

As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.

 

 

 

6 REPLIES 6
Highlighted

Re: Hive Vs. Impala Queries

Cloudera Employee

Any chance you could include the DDL and some sample data to allow us to reproduce it?

What versions of Impala & Hive are you running?

Re: Hive Vs. Impala Queries

New Contributor

Thanks for the reply. I was able to figure out the issue.

 

Thank you though!

Re: Hive Vs. Impala Queries

Master Guru

In spirit of http://xkcd.com/979/ please do post what your specific solution was, so others having the same issue and stumbling here would find it useful!

P.s. You can mark the topic as solved as well.

Re: Hive Vs. Impala Queries

Rising Star

I have a same issue, the same query returns different dates. In impala the date is one hour less than in Hive.

Table was created in hive, loaded with data via insert overwrite table in hive (table is partitioned).

 

And for example the timestamp 2014-11-18 00:30:00 - 18th of november was correctly written to partition 20141118.

But when I fetch the table in impala, whith condition day_id (partition column) = 20141118 I see a value 2014-11-17 23:30:00 

 

So the difference is one hour.

 

If I query the minimum and maximum start_time from the table in one partition in the Imapal (partition day_id = 2014118) I get this wrong result:

min( start_time ) = 2014-11-17 23:00 

max( start_time ) = 2014-11-18 22:59 

 

when I run the same query in Hive the result is ok:

min( start_time ) = 2014-11-18 00:00

max( start_time ) = 2014-11-18 23:59

 

Any help?

 

Re: Hive Vs. Impala Queries

Rising Star

I solved the issue with from_utc_timestamp(Create_Time, 'CEST').

Impala assumes the the timestamp value is stored in UCT. 

So converting to central european time with summery daylight saving will produce the correct result.

As far as I know there is no way to tell Impala that the current timezone is CEST, so in every query this conversion should be made.

 

 

 

Re: Hive Vs. Impala Queries

Rising Star

More interestingly this differencce dissappeared after upgrading to CDH 5.3.1.

T.