Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Group by one variable with selecting more than one variable

Group by one variable with selecting more than one variable

New Contributor

Hello I've got following code

create table p_n.analysis_file_01a as
select code,
prov_id,
tran_dt,
charge,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

I'm getting following error

Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'

Only way to run a code is to add to Group by all variable (code, prov_id_new, tran_dt, service_dt, allowance_fixed, charge_fixed, claim_number, bill_detl_lines)

But I would like to have result something like that:

code

prov_id

tran_dtchargetotal_charge
00112313/05/2015100150
00114514/05/201550150
00213522/06/20153550
00213922/07/20151550
,

Hi I've got a code like this:

create table p_n.analysis_file_01a as
select code,
prov_id_new,
tran_dt,
service_dt,
charge_fixed,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

So I would like to aggregate a total charge on a level of code.

When I used this code it give me error:

Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 10:0 Expression not in GROUP BY key 'prov_id_new'

I know to run the code it needs to add all fields so in this case (code, prov_id_new, tran_dt, service_dt, charge_fixed).

But this doesn't give me what I want (so total Charge per code).

So I would like to obtain this:

codeprov idtrand_dtService_dtCharge_fixedtotal_charge
001123512/03/201512/03/201550150
00199912/04/201512/05/2015100150
11278912/09/201512/10/20152560
11212301/01/201601/02/20163560

Thank you for any help.

2 REPLIES 2
Highlighted

Re: Group by one variable with selecting more than one variable

@Pavol Namer

You can't get the "total_charge" per code because your select statement includes multiple columns that have different values for the the same code. To get the output you are looking for you would typically do:

create table p_n.analysis_file_01a as
select code,
sum(charge_fixed) as total_charge
from p_n.analysis_file_00
where  service_dt >= '20130728' and service_dt <= '20160728'
group by code

An alternative approach would to use Pig or Spark against the Hive table to do what you are looking for.

Highlighted

Re: Group by one variable with selecting more than one variable

Hi,

You can use the windows functions in Hive using OVER (PARTITION BY code ORDER BY code DESC)

for your reference and I am not sure the performance but you can tweak that also.

Don't have an account?
Coming from Hortonworks? Activate your account here