Member since
04-21-2016
1
Post
0
Kudos Received
0
Solutions
04-21-2016
09:56 AM
2 Kudos
That was interesting I learned a bit about Hive today. Yes its possible: 0: jdbc:hive2://sandbox:10000/default> select * from test9;
+-----------+--------------------------+--+
| test9.id | test9.name |
+-----------+--------------------------+--+
| 1 | ["ben","klaus","klaus"] |
| 2 | ["ben","klaus","klaus"] |
+-----------+--------------------------+--+
You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:
0: jdbc:hive2://sandbox:10000/default> select id, collect_list(distinct(flatname))
from (select id,flatname from test9 lateral view explode(name) mytable as flatname) g group by id ;
+-----+------------------+--+
| id | _c1 |
+-----+------------------+--+
| 1 | ["ben","klaus"] |
| 2 | ["ben","klaus"] |
+-----+------------------+--+
You can also use a UDF might be faster. Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want. https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CombineUniqueUDAF.java
... View more