Support Questions

Find answers, ask questions, and share your expertise

Handling multiple records in hive

avatar
Explorer

Hi All,

I have a scenario where I have to do sum(column x) for month range and sum (column x) for a range of year and insert both the records into the next table, listing with example

example: sale_id,saleValue,date

1,1000,2015/12/14

2,2000,2015/11/01

3,3000,2015/12/01

4,4000,2015/01/01

Here when we try for this month the output is sum(salevalue) for id's 1 and 3, and for last year it's sum(salevalue) for id's 1,2,3 and 4

And I have to insert both the values in to next table

Appreciate your help

1 ACCEPTED SOLUTION

avatar
Contributor

You should look to GROUP BY WITH CUBE and ROLLUP to get multiple levels of aggregation in a single pass through the data. Take a look at https://cwiki.apache.org/confluence/display/Hive/E... for the documentation.

Here's a query that will get you what you want. You can filter the results from there.

SELECT
  SUM(saleValue), yr, yrmo
FROM (
  SELECT sale_id, saleValue, SUBSTR(`date`,1,4) AS yr, SUBSTR(`date`,1,7) AS yrmo
  FROM detaildata
) a 
GROUP BY yr, yrmo WITH ROLLUP

The result you get looks like this:

totalSaleValue  yr    yrmo
10000
10000           2015
4000            2015  2015/01
2000            2015  2015/11
4000            2015  2015/12

You can get at the yearly totals by filtering "WHERE yrmo IS NULL" or the December total with "WHERE yrmo = '2015/12'"

View solution in original post

4 REPLIES 4

avatar

@Ranjith M

Try two options below. It will read your source table just once and will create 2 lines:

create external table test_groupby 
(sale_id string, salevalue double, datex string) 
row format delimited fields terminated by ',' 
stored as textfile 
location '/tmp/groupby'
;

select * from test_groupby;

select inline(
  array(
    named_struct('type', 'month', 'value', sum(case when substr(datex, 1,7) = '2015/12' then salevalue end)),
    named_struct('type', 'year', 'value', sum(case when substr(datex, 1,4) = '2015' then salevalue end))
  ))
from test_groupby
;
create external table test_groupby 
(sale_id string, salevalue double, datex string) 
row format delimited fields terminated by ',' 
stored as textfile 
location '/tmp/groupby'
;

select * from test_groupby;

drop table test_groupby_result;
create table test_groupby_result
(value double)
partitioned by (type string)
;

from test_groupby
insert into table test_groupby_result partition (type = 'month')
select sum(salevalue)
where substr(datex, 1,7) = '2015/12'

insert into table test_groupby_result partition (type = 'year')
select sum(salevalue)
where substr(datex, 1,4) = '2015'
;

select * from test_groupby_result;

avatar
Explorer

Thank you very much @Guilherme Braccialli I will try to implement this logic

Will let you know updates

avatar
Contributor

You should look to GROUP BY WITH CUBE and ROLLUP to get multiple levels of aggregation in a single pass through the data. Take a look at https://cwiki.apache.org/confluence/display/Hive/E... for the documentation.

Here's a query that will get you what you want. You can filter the results from there.

SELECT
  SUM(saleValue), yr, yrmo
FROM (
  SELECT sale_id, saleValue, SUBSTR(`date`,1,4) AS yr, SUBSTR(`date`,1,7) AS yrmo
  FROM detaildata
) a 
GROUP BY yr, yrmo WITH ROLLUP

The result you get looks like this:

totalSaleValue  yr    yrmo
10000
10000           2015
4000            2015  2015/01
2000            2015  2015/11
4000            2015  2015/12

You can get at the yearly totals by filtering "WHERE yrmo IS NULL" or the December total with "WHERE yrmo = '2015/12'"

avatar

Awesome!