Created 12-17-2015 12:08 PM
Hi All,
I have a scenario where I have to do sum(column x) for month range and sum (column x) for a range of year and insert both the records into the next table, listing with example
example: sale_id,saleValue,date
1,1000,2015/12/14
2,2000,2015/11/01
3,3000,2015/12/01
4,4000,2015/01/01
Here when we try for this month the output is sum(salevalue) for id's 1 and 3, and for last year it's sum(salevalue) for id's 1,2,3 and 4
And I have to insert both the values in to next table
Appreciate your help
Created 12-22-2015 09:44 PM
You should look to GROUP BY WITH CUBE and ROLLUP to get multiple levels of aggregation in a single pass through the data. Take a look at https://cwiki.apache.org/confluence/display/Hive/E... for the documentation.
Here's a query that will get you what you want. You can filter the results from there.
SELECT SUM(saleValue), yr, yrmo FROM ( SELECT sale_id, saleValue, SUBSTR(`date`,1,4) AS yr, SUBSTR(`date`,1,7) AS yrmo FROM detaildata ) a GROUP BY yr, yrmo WITH ROLLUP
The result you get looks like this:
totalSaleValue yr yrmo 10000 10000 2015 4000 2015 2015/01 2000 2015 2015/11 4000 2015 2015/12
You can get at the yearly totals by filtering "WHERE yrmo IS NULL" or the December total with "WHERE yrmo = '2015/12'"
Created 12-17-2015 01:35 PM
Try two options below. It will read your source table just once and will create 2 lines:
create external table test_groupby (sale_id string, salevalue double, datex string) row format delimited fields terminated by ',' stored as textfile location '/tmp/groupby' ; select * from test_groupby; select inline( array( named_struct('type', 'month', 'value', sum(case when substr(datex, 1,7) = '2015/12' then salevalue end)), named_struct('type', 'year', 'value', sum(case when substr(datex, 1,4) = '2015' then salevalue end)) )) from test_groupby ;
create external table test_groupby (sale_id string, salevalue double, datex string) row format delimited fields terminated by ',' stored as textfile location '/tmp/groupby' ; select * from test_groupby; drop table test_groupby_result; create table test_groupby_result (value double) partitioned by (type string) ; from test_groupby insert into table test_groupby_result partition (type = 'month') select sum(salevalue) where substr(datex, 1,7) = '2015/12' insert into table test_groupby_result partition (type = 'year') select sum(salevalue) where substr(datex, 1,4) = '2015' ; select * from test_groupby_result;
Created 12-18-2015 11:46 AM
Thank you very much @Guilherme Braccialli I will try to implement this logic
Will let you know updates
Created 12-22-2015 09:44 PM
You should look to GROUP BY WITH CUBE and ROLLUP to get multiple levels of aggregation in a single pass through the data. Take a look at https://cwiki.apache.org/confluence/display/Hive/E... for the documentation.
Here's a query that will get you what you want. You can filter the results from there.
SELECT SUM(saleValue), yr, yrmo FROM ( SELECT sale_id, saleValue, SUBSTR(`date`,1,4) AS yr, SUBSTR(`date`,1,7) AS yrmo FROM detaildata ) a GROUP BY yr, yrmo WITH ROLLUP
The result you get looks like this:
totalSaleValue yr yrmo 10000 10000 2015 4000 2015 2015/01 2000 2015 2015/11 4000 2015 2015/12
You can get at the yearly totals by filtering "WHERE yrmo IS NULL" or the December total with "WHERE yrmo = '2015/12'"
Created 12-22-2015 09:47 PM
Awesome!