Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Get column values in comma separated value

avatar
Explorer

I have a source table Like

IDUSERDEPT
1User1Admin
2User1Accounts
3User2Finance
4User3Sales
5User3Finance

 

I want to generate a DataFrame like this

IDUSERDEPARTMENT
1User1Admin,Accounts
2User2Finance
3User3Sales,Finance
1 ACCEPTED SOLUTION

avatar
Super Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

avatar
Explorer

Thanks @pauldefusco

I would like to do it in spark - scala

avatar
Super Guru

Hi @ChineduLB ,

You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function.

 

val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT")

import org.apache.spark.sql.expressions.Window

df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

Labels