Support Questions

Find answers, ask questions, and share your expertise

Collapse rows into a map in Hive

avatar
New Contributor

I have a situation where I want to build a map by grouping rows resulting from a sequence of "UNION ALL"s. What Hive syntax (without using custom UDFs) will let me achieve something like this? Desired result:

id1id2result
100200{k1: v1, k2: v2...}
101201{k1: v1, k2: v2...}
select id1, id2, collect_map(col1_key, col2_val) as result from
(
select id1, id2, col1_key, col2_val from table1 

union all

select id1, id2, col1_key, col2_val from table2
)
2 REPLIES 2

avatar
Master Guru

Hi @Ed R, There is no such Hive syntax without a UDF, but there are some such UDFs available, and you can try to use them as is, or modify them if needed, like for example "to_map" described here. This is also assuming that your query is actually "select id1, id2, to_map(col1_key, col2_val) as result from <table> order by id1, id2" where <table> can either be a single table, or a union as you mentioned, and to_map is actaully a UDAF or User defined aggregate function. In this case you may want to decide how to handle cases where the same key points to different values. And finally, there is nothing misterious with Hive UDFs or UDAFs, you just implement some interfaces or overwrite some functions, and you have these examples to speed you up. As you go deeper into Hive, you will realize sooner or later that UDFs are the must.

avatar
New Contributor

Here's a hack I come up with for whoever might need it:


SELECT
  -- Presto version:
  -- map_agg(attr_id, attr_val) AS attrs
  str_to_map(
    -- Split the string into map by \x1 and \x2
      concat_ws(
        -- join the list with \x2
          string(unhex(2)),
        -- Create a list of "attr_id \x1 val" strings
          collect_list(concat_ws(string(unhex(1)), string(attr_id), attr_val))
        ),
      string(unhex(2)),
      string(unhex(1))
    ) AS attrs
FROM attrs_tall