Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

DataFrame groupBy and concat non-empty strings

avatar
New Member

I want to concatenate non-empty values in a column after grouping by some key.

Eg:

Supposing I have a dataframe:

df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+

I want to groupBy "id" and concatenate "num" together. Right now, I have this:

df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?

Thank you!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))