Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Irregularities in Select query

Solved Go to solution

Irregularities in Select query

I am working with the HDP2.3 Rev6 VM for a self paced course.

I am getting the below errors for the same query, when using aliases.

16299-hive-select.jpg

select sum(ordertotal), year(order_date) from orders group by year(order_date)

this query works fine.

But if I use aliases, it fails.

Am I missing something?

Regards,

Anirban.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Irregularities in Select query

While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.

Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.

To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.

Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.

Workaround

If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.

Set the following property:

SET hive.groupby.orderby.position.alias=true;

This will allow you to group by columns based on the position in the select clause.

So instead of writing your query like this:

select sum(ordertotal), year(order_date) from orders group by year(order_date)

You can write the group by clause using position as shown below:

select sum(ordertotal), year(order_date) from orders group by 2

This will get you the desired result without having to repeat "year(order_date)" in group by clause.

As always, if this answers helps you, please consider accepting it.

3 REPLIES 3

Re: Irregularities in Select query

While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.

Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.

To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.

Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.

Workaround

If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.

Set the following property:

SET hive.groupby.orderby.position.alias=true;

This will allow you to group by columns based on the position in the select clause.

So instead of writing your query like this:

select sum(ordertotal), year(order_date) from orders group by year(order_date)

You can write the group by clause using position as shown below:

select sum(ordertotal), year(order_date) from orders group by 2

This will get you the desired result without having to repeat "year(order_date)" in group by clause.

As always, if this answers helps you, please consider accepting it.

Re: Irregularities in Select query

thank you @Dinesh Chitlangia

that explains it very well.

Highlighted

Re: Irregularities in Select query

Expert Contributor

Couldnt Agree More with @Dinesh Chitlangia

Same is the case with WHERE clause.

If you have alias in WHERE clause we will facing similarerror.

Reason being here clause is evaluated before the select clause,

select col1 as c1, col2 as c2 from test_table where c1 = "ABC";

we make it work by re tweaking above query as

select * from ( select col1 as c1, col2 as c2 from test_table ) t1 where c1 = "ABC";

Don't have an account?
Coming from Hortonworks? Activate your account here