Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SPARK code for sql case statement and row_number equivalent

SPARK code for sql case statement and row_number equivalent

New Contributor

I have a data set like below

hduser@ubuntu:~$ hadoop fs -cat /user/hduser/test_sample/sample1.txt
Eid1,EName1,EDept1,100
Eid2,EName2,EDept1,102
Eid3,EName3,EDept1,101
Eid4,EName4,EDept2,110
Eid5,EName5,EDept2,121
Eid6,EName6,EDept3,99

I want to generate the output as below using spark code

Eid1,EName1,IT,102,1
Eid2,EName2,IT,101,2
Eid3,EName3,IT,100,3
Eid4,EName4,ComSc,121,1
Eid5,EName5,ComSc,110,2
Eid6,EName6,Mech,99,1

which is equivalent of the below SQL

Select emp_id, emp_name, case when emp_dept='EDept1' then 'IT' when emp_dept='EDept2' then 'ComSc' when emp_dept='EDept3' then 'Mech' end dept_name, emp_sal, row_number() over (partition by emp_dept order by emp_sal desc) as rn from emp

Can someone please suggest how should I get that in spark.

Don't have an account?
Coming from Hortonworks? Activate your account here