Support Questions

Find answers, ask questions, and share your expertise

Group by one variable with selecting more than one variable

avatar
New Contributor

Hello I've got following code

create table p_n.analysis_file_01a as
select code,
prov_id,
tran_dt,
charge,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

I'm getting following error

Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'

Only way to run a code is to add to Group by all variable (code, prov_id_new, tran_dt, service_dt, allowance_fixed, charge_fixed, claim_number, bill_detl_lines)

But I would like to have result something like that:

code

prov_id

tran_dtchargetotal_charge
00112313/05/2015100150
00114514/05/201550150
00213522/06/20153550
00213922/07/20151550
,

Hi I've got a code like this:

create table p_n.analysis_file_01a as
select code,
prov_id_new,
tran_dt,
service_dt,
charge_fixed,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

So I would like to aggregate a total charge on a level of code.

When I used this code it give me error:

Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'

I know to run the code it needs to add all fields so in this case (code, prov_id_new, tran_dt, service_dt, charge_fixed).

But this doesn't give me what I want (so total Charge per code).

So I would like to obtain this:

codeprov idtrand_dtService_dtCharge_fixedtotal_charge
001123512/03/201512/03/201550150
00199912/04/201512/05/2015100150
11278912/09/201512/10/20152560
11212301/01/201601/02/20163560

Thank you for any help.

2 REPLIES 2

avatar
Super Guru

@Pavol Namer

You can't get the "total_charge" per code because your select statement includes multiple columns that have different values for the the same code. To get the output you are looking for you would typically do:

create table p_n.analysis_file_01a as
select code,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

An alternative approach would to use Pig or Spark against the Hive table to do what you are looking for.

avatar
Contributor

Hi,

You can use the windows functions in Hive using OVER (PARTITION BY code ORDER BY code DESC)

for your reference and I am not sure the performance but you can tweak that also.