Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

DataFrame groupBy and concat non-empty strings

Explorer

I want to concatenate non-empty values in a column after grouping by some key.

Eg:

Supposing I have a dataframe:

df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+

I want to groupBy "id" and concatenate "num" together. Right now, I have this:

df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?

Thank you!

1 ACCEPTED SOLUTION

Expert Contributor

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))

View solution in original post

1 REPLY 1

Expert Contributor

Could you filter the empty string before the group?

df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.