<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Handling multiple records in hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99867#M12959</link>
    <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I have a scenario where I have to do sum(column x) for month range and sum (column x) for a range of year  and insert both the records into the next table, listing with example&lt;/P&gt;&lt;P&gt;example: sale_id,saleValue,date&lt;/P&gt;&lt;P&gt;1,1000,2015/12/14&lt;/P&gt;&lt;P&gt;2,2000,2015/11/01&lt;/P&gt;&lt;P&gt;3,3000,2015/12/01&lt;/P&gt;&lt;P&gt;4,4000,2015/01/01&lt;/P&gt;&lt;P&gt;Here when we try for this month the output is sum(salevalue) for id's 1 and 3, and for last year it's sum(salevalue) for id's 1,2,3 and 4&lt;/P&gt;&lt;P&gt;And I have to insert both the values in to next table &lt;/P&gt;&lt;P&gt;Appreciate your help&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 17 Dec 2015 20:08:45 GMT</pubDate>
    <dc:creator>ranjith1989</dc:creator>
    <dc:date>2015-12-17T20:08:45Z</dc:date>
    <item>
      <title>Handling multiple records in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99867#M12959</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I have a scenario where I have to do sum(column x) for month range and sum (column x) for a range of year  and insert both the records into the next table, listing with example&lt;/P&gt;&lt;P&gt;example: sale_id,saleValue,date&lt;/P&gt;&lt;P&gt;1,1000,2015/12/14&lt;/P&gt;&lt;P&gt;2,2000,2015/11/01&lt;/P&gt;&lt;P&gt;3,3000,2015/12/01&lt;/P&gt;&lt;P&gt;4,4000,2015/01/01&lt;/P&gt;&lt;P&gt;Here when we try for this month the output is sum(salevalue) for id's 1 and 3, and for last year it's sum(salevalue) for id's 1,2,3 and 4&lt;/P&gt;&lt;P&gt;And I have to insert both the values in to next table &lt;/P&gt;&lt;P&gt;Appreciate your help&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2015 20:08:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99867#M12959</guid>
      <dc:creator>ranjith1989</dc:creator>
      <dc:date>2015-12-17T20:08:45Z</dc:date>
    </item>
    <item>
      <title>Re: Handling multiple records in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99868#M12960</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1297/ranjithmorampudi.html" nodeid="1297"&gt;@Ranjith M&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Try two options below. It will read your source table just once and will create 2 lines:&lt;/P&gt;&lt;PRE&gt;create external table test_groupby 
(sale_id string, salevalue double, datex string) 
row format delimited fields terminated by ',' 
stored as textfile 
location '/tmp/groupby'
;

select * from test_groupby;

select inline(
  array(
    named_struct('type', 'month', 'value', sum(case when substr(datex, 1,7) = '2015/12' then salevalue end)),
    named_struct('type', 'year', 'value', sum(case when substr(datex, 1,4) = '2015' then salevalue end))
  ))
from test_groupby
;
&lt;/PRE&gt;&lt;PRE&gt;create external table test_groupby 
(sale_id string, salevalue double, datex string) 
row format delimited fields terminated by ',' 
stored as textfile 
location '/tmp/groupby'
;

select * from test_groupby;

drop table test_groupby_result;
create table test_groupby_result
(value double)
partitioned by (type string)
;

from test_groupby
insert into table test_groupby_result partition (type = 'month')
select sum(salevalue)
where substr(datex, 1,7) = '2015/12'

insert into table test_groupby_result partition (type = 'year')
select sum(salevalue)
where substr(datex, 1,4) = '2015'
;

select * from test_groupby_result;
&lt;/PRE&gt;</description>
      <pubDate>Thu, 17 Dec 2015 21:35:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99868#M12960</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-17T21:35:52Z</dc:date>
    </item>
    <item>
      <title>Re: Handling multiple records in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99869#M12961</link>
      <description>&lt;P&gt;Thank you very much  @&lt;A href="https://community.hortonworks.com/users/238/gbraccialli.html"&gt;Guilherme Braccialli&lt;/A&gt; I will try to implement this logic&lt;/P&gt;&lt;P&gt;Will let you know updates&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2015 19:46:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99869#M12961</guid>
      <dc:creator>ranjith1989</dc:creator>
      <dc:date>2015-12-18T19:46:49Z</dc:date>
    </item>
    <item>
      <title>Re: Handling multiple records in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99870#M12962</link>
      <description>&lt;P&gt;You should look to GROUP BY WITH CUBE and ROLLUP to get multiple levels of aggregation in a single pass through the data.  Take a look at &lt;A href="https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup#EnhancedAggregation,Cube,GroupingandRollup-CubesandRollups"&gt;https://cwiki.apache.org/confluence/display/Hive/E...&lt;/A&gt; for the documentation.&lt;/P&gt;&lt;P&gt;Here's a query that will get you what you want.  You can filter the results from there.&lt;/P&gt;&lt;PRE&gt;SELECT
  SUM(saleValue), yr, yrmo
FROM (
  SELECT sale_id, saleValue, SUBSTR(`date`,1,4) AS yr, SUBSTR(`date`,1,7) AS yrmo
  FROM detaildata
) a 
GROUP BY yr, yrmo WITH ROLLUP&lt;/PRE&gt;&lt;P&gt;The result you get looks like this:&lt;/P&gt;&lt;PRE&gt;totalSaleValue  yr    yrmo
10000
10000           2015
4000            2015  2015/01
2000            2015  2015/11
4000            2015  2015/12
&lt;/PRE&gt;&lt;P&gt;You can get at the yearly totals by filtering "WHERE yrmo IS NULL" or the December total with "WHERE yrmo = '2015/12'"&lt;/P&gt;</description>
      <pubDate>Wed, 23 Dec 2015 05:44:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99870#M12962</guid>
      <dc:creator>paul_boal</dc:creator>
      <dc:date>2015-12-23T05:44:57Z</dc:date>
    </item>
    <item>
      <title>Re: Handling multiple records in hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99871#M12963</link>
      <description>&lt;P&gt;Awesome!  &lt;/P&gt;</description>
      <pubDate>Wed, 23 Dec 2015 05:47:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Handling-multiple-records-in-hive/m-p/99871#M12963</guid>
      <dc:creator>gbraccialli3</dc:creator>
      <dc:date>2015-12-23T05:47:15Z</dc:date>
    </item>
  </channel>
</rss>

