Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Get column values in comma separated value

Explorer

I have a source table Like

IDUSERDEPT
1User1Admin
2User1Accounts
3User2Finance
4User3Sales
5User3Finance

 

I want to generate a DataFrame like this

IDUSERDEPARTMENT
1User1Admin,Accounts
2User2Finance
3User3Sales,Finance
1 ACCEPTED SOLUTION

Super Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

View solution in original post

3 REPLIES 3

Cloudera Employee

Explorer

Thanks @pauldefusco

I would like to do it in spark - scala

Super Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()