Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

DataFrame groupBy and concat non-empty strings

avatar
Contributor

I want to concatenate non-empty values in a column after grouping by some key.

Eg:

Supposing I have a dataframe:

df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+

I want to groupBy "id" and concatenate "num" together. Right now, I have this:

df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?

Thank you!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))