- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Irregularities in Select query
- Labels:
-
Apache Hive
Created on ‎06-14-2017 02:49 AM - edited ‎08-17-2019 09:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working with the HDP2.3 Rev6 VM for a self paced course.
I am getting the below errors for the same query, when using aliases.
select sum(ordertotal), year(order_date) from orders group by year(order_date)
this query works fine.
But if I use aliases, it fails.
Am I missing something?
Regards,
Anirban.
Created ‎06-14-2017 07:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.
Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.
To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.
Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.
Workaround
If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.
Set the following property:
SET hive.groupby.orderby.position.alias=true;
This will allow you to group by columns based on the position in the select clause.
So instead of writing your query like this:
select sum(ordertotal), year(order_date) from orders group by year(order_date)
You can write the group by clause using position as shown below:
select sum(ordertotal), year(order_date) from orders group by 2
This will get you the desired result without having to repeat "year(order_date)" in group by clause.
As always, if this answers helps you, please consider accepting it.
Created ‎06-14-2017 07:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge.
Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references.
To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order.
Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing.
Workaround
If you still want to write queries where you want to group by derived column, you can do it using this nifty technique.
Set the following property:
SET hive.groupby.orderby.position.alias=true;
This will allow you to group by columns based on the position in the select clause.
So instead of writing your query like this:
select sum(ordertotal), year(order_date) from orders group by year(order_date)
You can write the group by clause using position as shown below:
select sum(ordertotal), year(order_date) from orders group by 2
This will get you the desired result without having to repeat "year(order_date)" in group by clause.
As always, if this answers helps you, please consider accepting it.
Created ‎06-15-2017 07:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you @Dinesh Chitlangia
that explains it very well.
Created ‎06-14-2017 08:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Couldnt Agree More with @Dinesh Chitlangia
Same is the case with WHERE clause.
If you have alias in WHERE clause we will facing similarerror.
Reason being here clause is evaluated before the select clause,
select col1 as c1, col2 as c2 from test_table where c1 = "ABC";
we make it work by re tweaking above query as
select * from ( select col1 as c1, col2 as c2 from test_table ) t1 where c1 = "ABC";
