<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question DataFrame groupBy and concat non-empty strings in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/DataFrame-groupBy-and-concat-non-empty-strings/m-p/126092#M34567</link>
    <description>&lt;P&gt;I want to concatenate non-empty values in a column after grouping by some key.&lt;/P&gt;&lt;P&gt;Eg:&lt;/P&gt;&lt;P&gt;Supposing I have a dataframe:&lt;/P&gt;&lt;PRE&gt;df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+
&lt;/PRE&gt;&lt;P&gt;I want to groupBy "id" and concatenate "num" together. Right now, I have this:&lt;/P&gt;&lt;PRE&gt;df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
&lt;/PRE&gt;&lt;P&gt;Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
    <pubDate>Wed, 13 Jul 2016 11:57:31 GMT</pubDate>
    <dc:creator>jestinm</dc:creator>
    <dc:date>2016-07-13T11:57:31Z</dc:date>
    <item>
      <title>DataFrame groupBy and concat non-empty strings</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/DataFrame-groupBy-and-concat-non-empty-strings/m-p/126092#M34567</link>
      <description>&lt;P&gt;I want to concatenate non-empty values in a column after grouping by some key.&lt;/P&gt;&lt;P&gt;Eg:&lt;/P&gt;&lt;P&gt;Supposing I have a dataframe:&lt;/P&gt;&lt;PRE&gt;df.show()

+---+---+----+
| id|num|num2|
+---+---+----+
|  1|  3|   5|
|  2|  3|   4|
|  1|   |   2|
|  1| 10|   0|
+---+---+----+
&lt;/PRE&gt;&lt;P&gt;I want to groupBy "id" and concatenate "num" together. Right now, I have this:&lt;/P&gt;&lt;PRE&gt;df.groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))
&lt;/PRE&gt;&lt;P&gt;Which concatenates by key but doesn't exclude empty strings. Is there a way I can specify in the Column argument of concat_ws() or collect_list() to exclude some kind of string?&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2016 11:57:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/DataFrame-groupBy-and-concat-non-empty-strings/m-p/126092#M34567</guid>
      <dc:creator>jestinm</dc:creator>
      <dc:date>2016-07-13T11:57:31Z</dc:date>
    </item>
    <item>
      <title>Re: DataFrame groupBy and concat non-empty strings</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/DataFrame-groupBy-and-concat-non-empty-strings/m-p/126093#M34568</link>
      <description>&lt;P&gt;Could you filter the empty string before the group?&lt;/P&gt;&lt;PRE&gt;df.where(df("number") !== "").groupBy($"id").agg(concat_ws(DELIM, collect_list($"num")))&lt;/PRE&gt;</description>
      <pubDate>Sat, 23 Jul 2016 04:11:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/DataFrame-groupBy-and-concat-non-empty-strings/m-p/126093#M34568</guid>
      <dc:creator>qiwang</dc:creator>
      <dc:date>2016-07-23T04:11:07Z</dc:date>
    </item>
  </channel>
</rss>

