<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Get UNIQUE values of each array in column in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Get-UNIQUE-values-of-each-array-in-column/m-p/109009#M25749</link>
    <description>&lt;P&gt;That was interesting I learned a bit about Hive today. Yes its possible:&lt;/P&gt;&lt;PRE&gt;0: jdbc:hive2://sandbox:10000/default&amp;gt; select * from test9;
+-----------+--------------------------+--+
| test9.id  |        test9.name        |
+-----------+--------------------------+--+
| 1         | ["ben","klaus","klaus"]  |
| 2         | ["ben","klaus","klaus"]  |
+-----------+--------------------------+--+
&lt;/PRE&gt;&lt;P&gt;You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:&lt;/P&gt;&lt;PRE&gt;

0: jdbc:hive2://sandbox:10000/default&amp;gt; select id, collect_list(distinct(flatname)) 
from (select id,flatname  from test9 lateral view explode(name) mytable as flatname) g group by id ;
+-----+------------------+--+
| id  |       _c1        |
+-----+------------------+--+
| 1   | ["ben","klaus"]  |
| 2   | ["ben","klaus"]  |
+-----+------------------+--+
&lt;/PRE&gt;&lt;P&gt;You can also use a UDF might be faster.&lt;/P&gt;&lt;P&gt;Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want. &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CombineUniqueUDAF.java" target="_blank"&gt;https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CombineUniqueUDAF.java&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 21 Apr 2016 16:56:25 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-04-21T16:56:25Z</dc:date>
    <item>
      <title>Get UNIQUE values of each array in column</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Get-UNIQUE-values-of-each-array-in-column/m-p/109008#M25748</link>
      <description>&lt;P&gt;Hey guys,&lt;/P&gt;&lt;P&gt;I have a column in my hive db that contains arrays of strings, let's say column1.
I now want to create a new column based on this one, but now with arrays that have only the unique strings of the arrays in column1. &lt;/P&gt;&lt;P&gt;How do I best go about this in HiveQL? I've looked around but haven't found a solution I can implement.&lt;/P&gt;&lt;P&gt;Tx!!&lt;/P&gt;&lt;P&gt;Willem&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 15:37:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Get-UNIQUE-values-of-each-array-in-column/m-p/109008#M25748</guid>
      <dc:creator>willem_lenaerts</dc:creator>
      <dc:date>2016-04-21T15:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: Get UNIQUE values of each array in column</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Get-UNIQUE-values-of-each-array-in-column/m-p/109009#M25749</link>
      <description>&lt;P&gt;That was interesting I learned a bit about Hive today. Yes its possible:&lt;/P&gt;&lt;PRE&gt;0: jdbc:hive2://sandbox:10000/default&amp;gt; select * from test9;
+-----------+--------------------------+--+
| test9.id  |        test9.name        |
+-----------+--------------------------+--+
| 1         | ["ben","klaus","klaus"]  |
| 2         | ["ben","klaus","klaus"]  |
+-----------+--------------------------+--+
&lt;/PRE&gt;&lt;P&gt;You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:&lt;/P&gt;&lt;PRE&gt;

0: jdbc:hive2://sandbox:10000/default&amp;gt; select id, collect_list(distinct(flatname)) 
from (select id,flatname  from test9 lateral view explode(name) mytable as flatname) g group by id ;
+-----+------------------+--+
| id  |       _c1        |
+-----+------------------+--+
| 1   | ["ben","klaus"]  |
| 2   | ["ben","klaus"]  |
+-----+------------------+--+
&lt;/PRE&gt;&lt;P&gt;You can also use a UDF might be faster.&lt;/P&gt;&lt;P&gt;Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want. &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CombineUniqueUDAF.java" target="_blank"&gt;https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CombineUniqueUDAF.java&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Apr 2016 16:56:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Get-UNIQUE-values-of-each-array-in-column/m-p/109009#M25749</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-04-21T16:56:25Z</dc:date>
    </item>
  </channel>
</rss>

