Created 04-21-2016 08:37 AM
Hey guys,
I have a column in my hive db that contains arrays of strings, let's say column1. I now want to create a new column based on this one, but now with arrays that have only the unique strings of the arrays in column1.
How do I best go about this in HiveQL? I've looked around but haven't found a solution I can implement.
Tx!!
Willem
Created 04-21-2016 09:56 AM
That was interesting I learned a bit about Hive today. Yes its possible:
0: jdbc:hive2://sandbox:10000/default> select * from test9; +-----------+--------------------------+--+ | test9.id | test9.name | +-----------+--------------------------+--+ | 1 | ["ben","klaus","klaus"] | | 2 | ["ben","klaus","klaus"] | +-----------+--------------------------+--+
You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:
0: jdbc:hive2://sandbox:10000/default> select id, collect_list(distinct(flatname)) from (select id,flatname from test9 lateral view explode(name) mytable as flatname) g group by id ; +-----+------------------+--+ | id | _c1 | +-----+------------------+--+ | 1 | ["ben","klaus"] | | 2 | ["ben","klaus"] | +-----+------------------+--+
You can also use a UDF might be faster.
Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want.
Created 04-21-2016 09:56 AM
That was interesting I learned a bit about Hive today. Yes its possible:
0: jdbc:hive2://sandbox:10000/default> select * from test9; +-----------+--------------------------+--+ | test9.id | test9.name | +-----------+--------------------------+--+ | 1 | ["ben","klaus","klaus"] | | 2 | ["ben","klaus","klaus"] | +-----------+--------------------------+--+
You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:
0: jdbc:hive2://sandbox:10000/default> select id, collect_list(distinct(flatname)) from (select id,flatname from test9 lateral view explode(name) mytable as flatname) g group by id ; +-----+------------------+--+ | id | _c1 | +-----+------------------+--+ | 1 | ["ben","klaus"] | | 2 | ["ben","klaus"] | +-----+------------------+--+
You can also use a UDF might be faster.
Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want.