- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Collapse rows into a map in Hive
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎01-31-2017 07:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a situation where I want to build a map by grouping rows resulting from a sequence of "UNION ALL"s. What Hive syntax (without using custom UDFs) will let me achieve something like this? Desired result:
id1 | id2 | result |
100 | 200 | {k1: v1, k2: v2...} |
101 | 201 | {k1: v1, k2: v2...} |
select id1, id2, collect_map(col1_key, col2_val) as result from ( select id1, id2, col1_key, col2_val from table1 union all select id1, id2, col1_key, col2_val from table2 )
Created ‎02-07-2017 10:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ed R, There is no such Hive syntax without a UDF, but there are some such UDFs available, and you can try to use them as is, or modify them if needed, like for example "to_map" described here. This is also assuming that your query is actually "select id1, id2, to_map(col1_key, col2_val) as result from <table> order by id1, id2" where <table> can either be a single table, or a union as you mentioned, and to_map is actaully a UDAF or User defined aggregate function. In this case you may want to decide how to handle cases where the same key points to different values. And finally, there is nothing misterious with Hive UDFs or UDAFs, you just implement some interfaces or overwrite some functions, and you have these examples to speed you up. As you go deeper into Hive, you will realize sooner or later that UDFs are the must.
Created ‎07-12-2019 01:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's a hack I come up with for whoever might need it:
SELECT -- Presto version: -- map_agg(attr_id, attr_val) AS attrs str_to_map( -- Split the string into map by \x1 and \x2 concat_ws( -- join the list with \x2 string(unhex(2)), -- Create a list of "attr_id \x1 val" strings collect_list(concat_ws(string(unhex(1)), string(attr_id), attr_val)) ), string(unhex(2)), string(unhex(1)) ) AS attrs FROM attrs_tall
