- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
DataFrame groupBy and concat non-empty strings
- Labels:
-
Apache Falcon
-
Apache Spark
Created ‎07-13-2016 04:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to concatenate non-empty values in a column after grouping by some key.
Eg:
Supposing I have a dataframe:
df.show() +---+---+----+ | id|num|num2| +---+---+----+ | 1| 3| 5| | 2| 3| 4| | 1| | 2| | 1| 10| 0| +---+---+----+
I want to groupBy "id" and concatenate "num" together. Right now, I have this:
df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?
Thank you!
Created ‎07-22-2016 09:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you filter the empty string before the group?
df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
Created ‎07-22-2016 09:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you filter the empty string before the group?
df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
