Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DataFrame groupBy and concat non-empty strings

Solved Go to solution
Highlighted

DataFrame groupBy and concat non-empty strings

New Contributor

I want to concatenate non-empty values in a column after grouping by some key.

Eg:

Supposing I have a dataframe:

df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+

I want to groupBy "id" and concatenate "num" together. Right now, I have this:

df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: DataFrame groupBy and concat non-empty strings

Expert Contributor

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
1 REPLY 1

Re: DataFrame groupBy and concat non-empty strings

Expert Contributor

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))