- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Get UNIQUE values of each array in column
- Labels:
-
Apache Hive
Created ‎04-21-2016 08:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey guys,
I have a column in my hive db that contains arrays of strings, let's say column1. I now want to create a new column based on this one, but now with arrays that have only the unique strings of the arrays in column1.
How do I best go about this in HiveQL? I've looked around but haven't found a solution I can implement.
Tx!!
Willem
Created ‎04-21-2016 09:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That was interesting I learned a bit about Hive today. Yes its possible:
0: jdbc:hive2://sandbox:10000/default> select * from test9; +-----------+--------------------------+--+ | test9.id | test9.name | +-----------+--------------------------+--+ | 1 | ["ben","klaus","klaus"] | | 2 | ["ben","klaus","klaus"] | +-----------+--------------------------+--+
You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:
0: jdbc:hive2://sandbox:10000/default> select id, collect_list(distinct(flatname)) from (select id,flatname from test9 lateral view explode(name) mytable as flatname) g group by id ; +-----+------------------+--+ | id | _c1 | +-----+------------------+--+ | 1 | ["ben","klaus"] | | 2 | ["ben","klaus"] | +-----+------------------+--+
You can also use a UDF might be faster.
Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want.
Created ‎04-21-2016 09:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That was interesting I learned a bit about Hive today. Yes its possible:
0: jdbc:hive2://sandbox:10000/default> select * from test9; +-----------+--------------------------+--+ | test9.id | test9.name | +-----------+--------------------------+--+ | 1 | ["ben","klaus","klaus"] | | 2 | ["ben","klaus","klaus"] | +-----------+--------------------------+--+
You can explode the arrays in a lateral view, and then using collect_list to merge them again using a distinct:
0: jdbc:hive2://sandbox:10000/default> select id, collect_list(distinct(flatname)) from (select id,flatname from test9 lateral view explode(name) mytable as flatname) g group by id ; +-----+------------------+--+ | id | _c1 | +-----+------------------+--+ | 1 | ["ben","klaus"] | | 2 | ["ben","klaus"] | +-----+------------------+--+
You can also use a UDF might be faster.
Brickhouse provides a set of really cool UDFs which I have used before. CombineUnique sounds like what you want.
