Support Questions

Find answers, ask questions, and share your expertise

Trying to use Hive EXPLODE function to "unfold" an array within a field into multiple rows

avatar
New Contributor

Hi,

I am new to Hive. We are using NIFI to bring data in from JSON messages into Hortonworks HDFS.

Our JSON messages are nested many levels deep. Using get_json_object, I am pulling back data from a table that holds coverage data for an insurance policy.

Here is my query:

---------------start query--------------------

with QuoteCreated as

( Select quote_id,

get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type

from dcbi_dev

where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8' )

select quote_id, coverage_type

-- this doesn't work --> explode(coverage_type) from QuoteCreated;

---------------end query--------------------

When I run this query in Hive (from Ambari), it returns single row with these values:

quote_id = 57226f1e01a9c82283d02ff8

coverage_type = ["BodilyInjury","PropertyDamage","RentalReimbursement"]

I want to have, instead, 3 rows with the same quote_id, and each row will have one of the 3 values for the coverage_type

I've been attempting to use the Explode function, but when I do, it gives me this error:

"Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions [ERROR_STATUS]"

Research led me to believe that my "array" is, in fact, being stored as a string. I've been unable to convert it to an array or figure out how to make this work.

Can anyone help me figure out how to do this? This would be awesome if we could use explode to unfold arrays into multiple rows for queries and reporting. Thanks very much for any help.

1 ACCEPTED SOLUTION

avatar
Contributor

You should change your query to this:

with q as

(

select quote_id,

get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type

from dcbi_dev

where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8'

)

select quote_id, b from q lateral view explode(split(substr(q.coverage_type,2,length(q.coverage_type) - 2),',')) exploded as b;

View solution in original post

5 REPLIES 5

avatar
Contributor

@stevekaufman, you need to use lateral view along with explode to generate 3 rows with same quote_id. Also, explode() only takes array() or map() as input. So, you should convert coverage_type to one of these formats.

Here's similar example:

hive> select get_json_object(json_table.json,'$') from json_table;

OK

{"id":"1","colorsArray":["red","green","blue","cyan"]}

Time taken: 0.199 seconds, Fetched: 1 row(s)

hive> with q as

> (

> select get_json_object(json_table.json, '$.id') as id,

> get_json_object(json_table.json, '$.colorsArray') as colorsArray from json_table

> )

> select id, b from q lateral view explode(split(substr(q.colorsArray,2,length(q.colorsArray) - 2),',')) exploded as b;

Query ID = hrt_qa_20160527223241_c69a0710-3d8d-4cb0-b63c-3d930f94b751

Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1464124515927_0014)

--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 6.75 s

--------------------------------------------------------------------------------

OK

1 "red"

1 "green"

1 "blue"

1 "cyan"

Time taken: 7.631 seconds, Fetched: 4 row(s)) from json_table;

avatar
New Contributor
Thanks a lot for helping me!  I'm still trying to figure this out.  So this query: 

Select quote_id,
get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type
from dcbi_dev
where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8' 
;

Returns this: (see table)

How would I modify my code to return 3 lines with the same quote_id and "Deductible" for the coverage_type for row 1, "PerDay" for row 2, and "PerOccurence" for row 3? 


Thanks again for the help





quote_id coverage_type
57226f1e01a9c82283d02ff8 ["Deductible","PerDay","PerOccurrence"]

avatar
Contributor

You should change your query to this:

with q as

(

select quote_id,

get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type

from dcbi_dev

where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8'

)

select quote_id, b from q lateral view explode(split(substr(q.coverage_type,2,length(q.coverage_type) - 2),',')) exploded as b;

avatar
New Contributor

I cannot even say how much this helps me. Thanks so much

avatar
New Contributor

you can try
https://my.oschina.net/leejun2005/blog/120463

For example,

select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;
should be changed to

select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。