Support Questions

nrbndsdb0509 · ‎06-14-2017

I am working with the HDP2.3 Rev6 VM for a self paced course.

I am getting the below errors for the same query, when using aliases.

select sum(ordertotal), year(order_date) from orders group by year(order_date)

this query works fine.

But if I use aliases, it fails.

Am I missing something?

Regards,

Anirban.

dineshc · ‎06-14-2017

While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.

Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.

To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.

Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.

Workaround

If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.

Set the following property:

SET hive.groupby.orderby.position.alias=true;

This will allow you to group by columns based on the position in the select clause.

So instead of writing your query like this:

select sum(ordertotal), year(order_date) from orders group by year(order_date)

You can write the group by clause using position as shown below:

select sum(ordertotal), year(order_date) from orders group by 2

This will get you the desired result without having to repeat "year(order_date)" in group by clause.

As always, if this answers helps you, please consider accepting it.

View solution in original post

dineshc · ‎06-14-2017

While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.

Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.

To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.

Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.

Workaround

If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.

Set the following property:

SET hive.groupby.orderby.position.alias=true;

This will allow you to group by columns based on the position in the select clause.

So instead of writing your query like this:

select sum(ordertotal), year(order_date) from orders group by year(order_date)

You can write the group by clause using position as shown below:

select sum(ordertotal), year(order_date) from orders group by 2

This will get you the desired result without having to repeat "year(order_date)" in group by clause.

As always, if this answers helps you, please consider accepting it.

nrbndsdb0509 · ‎06-15-2017

thank you @Dinesh Chitlangia

that explains it very well.

sreeviswa_athic · ‎06-14-2017

Couldnt Agree More with @Dinesh Chitlangia

Same is the case with WHERE clause.

If you have alias in WHERE clause we will facing similarerror.

Reason being here clause is evaluated before the select clause,

select col1 as c1, col2 as c2 from test_table where c1 = "ABC";

we make it work by re tweaking above query as

select * from ( select col1 as c1, col2 as c2 from test_table ) t1 where c1 = "ABC";

Cloudera Community

Support Questions

Irregularities in Select query

Beeline select query

SELECT queries fail on Hive table view

select query inside hive not working

SQL SELECT into Variable

Simple Select slow query in Hive

Query Hive Using Python

straight SELECT and SELECT via CTE produce differe...

Presto select query fails on Hive ACID Table

Unable to run query select * from <table_name>

Simple 'select' query on ORC table runs forever on...