Created 10-04-2016 09:13 AM
Hello I've got following code
create table p_n.analysis_file_01a as select code, prov_id, tran_dt, charge, sum(charge_fixed) as total_charge from p_n.analysis_file_00 where service_dt >= '20130728' and service_dt <= '20160728' group by code
I'm getting following error
Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'
Only way to run a code is to add to Group by all variable (code, prov_id_new, tran_dt, service_dt, allowance_fixed, charge_fixed, claim_number, bill_detl_lines)
But I would like to have result something like that:
code | prov_id | tran_dt | charge | total_charge |
001 | 123 | 13/05/2015 | 100 | 150 |
001 | 145 | 14/05/2015 | 50 | 150 |
002 | 135 | 22/06/2015 | 35 | 50 |
002 | 139 | 22/07/2015 | 15 | 50 |
Hi I've got a code like this:
create table p_n.analysis_file_01a as select code, prov_id_new, tran_dt, service_dt, charge_fixed, sum(charge_fixed) as total_charge from p_n.analysis_file_00 where service_dt >= '20130728' and service_dt <= '20160728' group by code
So I would like to aggregate a total charge on a level of code.
When I used this code it give me error:
Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'
I know to run the code it needs to add all fields so in this case (code, prov_id_new, tran_dt, service_dt, charge_fixed).
But this doesn't give me what I want (so total Charge per code).
So I would like to obtain this:
code | prov id | trand_dt | Service_dt | Charge_fixed | total_charge |
001 | 1235 | 12/03/2015 | 12/03/2015 | 50 | 150 |
001 | 999 | 12/04/2015 | 12/05/2015 | 100 | 150 |
112 | 789 | 12/09/2015 | 12/10/2015 | 25 | 60 |
112 | 123 | 01/01/2016 | 01/02/2016 | 35 | 60 |
Thank you for any help.
Created 10-05-2016 08:17 PM
You can't get the "total_charge" per code because your select statement includes multiple columns that have different values for the the same code. To get the output you are looking for you would typically do:
create table p_n.analysis_file_01a as select code, sum(charge_fixed) as total_charge from p_n.analysis_file_00 where service_dt >= '20130728' and service_dt <= '20160728' group by code
An alternative approach would to use Pig or Spark against the Hive table to do what you are looking for.
Created 10-07-2016 03:28 AM
Hi,
You can use the windows functions in Hive using OVER (PARTITION BY code ORDER BY code DESC)
for your reference and I am not sure the performance but you can tweak that also.