<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hive QL - Aggregating within a group in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142986#M23676</link>
    <description>&lt;P&gt;You're doing a window function and a group by on the same column, and that seems to be your error.&lt;/P&gt;&lt;P&gt;Try this:&lt;/P&gt;&lt;PRE&gt;SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;&lt;/PRE&gt;</description>
    <pubDate>Fri, 25 Mar 2016 16:15:41 GMT</pubDate>
    <dc:creator>sluangsay</dc:creator>
    <dc:date>2016-03-25T16:15:41Z</dc:date>
    <item>
      <title>Hive QL - Aggregating within a group</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142985#M23675</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;I am new to the Hive QL language and am trying to solve the following problem.&lt;/P&gt;&lt;P&gt;I have a set of data with user Id's, each with a corresponding score. An example of the kind of data I have is below:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;stackdata_clean.owneruserid&lt;/TD&gt;&lt;TD&gt;stackdata_clean.score&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I want to find the top 10 users by score. In other words, I want code to make a table like the below and then pick the top 10 users with the highest aggregate score from it:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;P&gt;stackdata_clean.owneruserid&lt;/P&gt;&lt;/TD&gt;&lt;TD&gt;&lt;P&gt;stackdata_clean.score&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;9&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;My table name is stackdata_clean and the code I am trying to use is:&lt;/P&gt;&lt;PRE&gt;SELECT stackdata_clean.owneruserid,
SUM(stackdata_clean.score) over(PARTITION BY stackdata_clean.owneruserid)
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sum(stackdata_clean.score)DESC LIMIT 10;&lt;/PRE&gt;&lt;P&gt;I am being returned the following error:&lt;/P&gt;&lt;PRE&gt;Error while compiling statement: FAILED: 
SemanticException Failed to breakup Windowing invocations into Groups. 
At least 1 group must only depend on input columns. Also check for 
circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: 
Line 2:20 Invalid column reference 'score' [ERROR_STATUS]&lt;/PRE&gt;&lt;P&gt;Can anyone help solve this problem?&lt;/P&gt;&lt;P&gt;Any help is greatly appreciated!&lt;/P&gt;&lt;P&gt;Thanks in advance &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Mar 2016 03:33:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142985#M23675</guid>
      <dc:creator>maeve_ryan226</dc:creator>
      <dc:date>2016-03-25T03:33:59Z</dc:date>
    </item>
    <item>
      <title>Re: Hive QL - Aggregating within a group</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142986#M23676</link>
      <description>&lt;P&gt;You're doing a window function and a group by on the same column, and that seems to be your error.&lt;/P&gt;&lt;P&gt;Try this:&lt;/P&gt;&lt;PRE&gt;SELECT stackdata_clean.owneruserid, 
SUM(stackdata_clean.score) as sumscore
FROM stackdata_clean
GROUP BY stackdata_clean.owneruserid
ORDER BY sumscore DESC LIMIT 10;&lt;/PRE&gt;</description>
      <pubDate>Fri, 25 Mar 2016 16:15:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142986#M23676</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2016-03-25T16:15:41Z</dc:date>
    </item>
    <item>
      <title>Re: Hive QL - Aggregating within a group</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142987#M23677</link>
      <description>&lt;P&gt;That worked - thanks a lot!&lt;/P&gt;</description>
      <pubDate>Sat, 26 Mar 2016 16:34:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hive-QL-Aggregating-within-a-group/m-p/142987#M23677</guid>
      <dc:creator>maeve_ryan226</dc:creator>
      <dc:date>2016-03-26T16:34:18Z</dc:date>
    </item>
  </channel>
</rss>

