Support Questions

Find answers, ask questions, and share your expertise

Get column values in comma separated value

avatar
Rising Star

I have a source table Like

IDUSERDEPT
1User1Admin
2User1Accounts
3User2Finance
4User3Sales
5User3Finance

 

I want to generate a DataFrame like this

IDUSERDEPARTMENT
1User1Admin,Accounts
2User2Finance
3User3Sales,Finance
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

View solution in original post

3 REPLIES 3

avatar
Rising Star

avatar
Rising Star

Thanks @pauldefusco

I would like to do it in spark - scala

avatar
Master Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()