Created 01-31-2017 07:57 PM
I have a situation where I want to build a map by grouping rows resulting from a sequence of "UNION ALL"s. What Hive syntax (without using custom UDFs) will let me achieve something like this? Desired result:
id1 | id2 | result |
100 | 200 | {k1: v1, k2: v2...} |
101 | 201 | {k1: v1, k2: v2...} |
select id1, id2, collect_map(col1_key, col2_val) as result from ( select id1, id2, col1_key, col2_val from table1 union all select id1, id2, col1_key, col2_val from table2 )
Created 02-07-2017 10:16 PM
Hi @Ed R, There is no such Hive syntax without a UDF, but there are some such UDFs available, and you can try to use them as is, or modify them if needed, like for example "to_map" described here. This is also assuming that your query is actually "select id1, id2, to_map(col1_key, col2_val) as result from <table> order by id1, id2" where <table> can either be a single table, or a union as you mentioned, and to_map is actaully a UDAF or User defined aggregate function. In this case you may want to decide how to handle cases where the same key points to different values. And finally, there is nothing misterious with Hive UDFs or UDAFs, you just implement some interfaces or overwrite some functions, and you have these examples to speed you up. As you go deeper into Hive, you will realize sooner or later that UDFs are the must.
Created 07-12-2019 01:34 AM
Here's a hack I come up with for whoever might need it:
SELECT -- Presto version: -- map_agg(attr_id, attr_val) AS attrs str_to_map( -- Split the string into map by \x1 and \x2 concat_ws( -- join the list with \x2 string(unhex(2)), -- Create a list of "attr_id \x1 val" strings collect_list(concat_ws(string(unhex(1)), string(attr_id), attr_val)) ), string(unhex(2)), string(unhex(1)) ) AS attrs FROM attrs_tall