Created 05-26-2016 04:49 PM
Hi,
I am new to Hive. We are using NIFI to bring data in from JSON messages into Hortonworks HDFS.
Our JSON messages are nested many levels deep. Using get_json_object, I am pulling back data from a table that holds coverage data for an insurance policy.
Here is my query:
---------------start query--------------------
with QuoteCreated as
( Select quote_id,
get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type
from dcbi_dev
where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8' )
select quote_id, coverage_type
-- this doesn't work --> explode(coverage_type) from QuoteCreated;
---------------end query--------------------
When I run this query in Hive (from Ambari), it returns single row with these values:
quote_id = 57226f1e01a9c82283d02ff8
coverage_type = ["BodilyInjury","PropertyDamage","RentalReimbursement"]
I want to have, instead, 3 rows with the same quote_id, and each row will have one of the 3 values for the coverage_type
I've been attempting to use the Explode function, but when I do, it gives me this error:
"Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions [ERROR_STATUS]"
Research led me to believe that my "array" is, in fact, being stored as a string. I've been unable to convert it to an array or figure out how to make this work.
Can anyone help me figure out how to do this? This would be awesome if we could use explode to unfold arrays into multiple rows for queries and reporting. Thanks very much for any help.
Created 05-31-2016 06:21 PM
You should change your query to this:
with q as
(
select quote_id,
get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type
from dcbi_dev
where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8'
)
select quote_id, b from q lateral view explode(split(substr(q.coverage_type,2,length(q.coverage_type) - 2),',')) exploded as b;
Created 05-27-2016 10:54 PM
@stevekaufman, you need to use lateral view along with explode to generate 3 rows with same quote_id. Also, explode() only takes array() or map() as input. So, you should convert coverage_type to one of these formats.
Here's similar example:
hive> select get_json_object(json_table.json,'$') from json_table;
OK
{"id":"1","colorsArray":["red","green","blue","cyan"]}
Time taken: 0.199 seconds, Fetched: 1 row(s)
hive> with q as
> (
> select get_json_object(json_table.json, '$.id') as id,
> get_json_object(json_table.json, '$.colorsArray') as colorsArray from json_table
> )
> select id, b from q lateral view explode(split(substr(q.colorsArray,2,length(q.colorsArray) - 2),',')) exploded as b;
Query ID = hrt_qa_20160527223241_c69a0710-3d8d-4cb0-b63c-3d930f94b751
Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1464124515927_0014)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 6.75 s
--------------------------------------------------------------------------------
OK
1 "red"
1 "green"
1 "blue"
1 "cyan"
Time taken: 7.631 seconds, Fetched: 4 row(s)) from json_table;
Created 05-31-2016 12:26 AM
Thanks a lot for helping me! I'm still trying to figure this out. So this query: Select quote_id, get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type from dcbi_dev where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8' ; Returns this: (see table) How would I modify my code to return 3 lines with the same quote_id and "Deductible" for the coverage_type for row 1, "PerDay" for row 2, and "PerOccurence" for row 3? Thanks again for the help
quote_id | coverage_type |
57226f1e01a9c82283d02ff8 | ["Deductible","PerDay","PerOccurrence"] |
Created 05-31-2016 06:21 PM
You should change your query to this:
with q as
(
select quote_id,
get_json_object(message_full,'$.event.quote.vehicles.coverages.limits.type') as coverage_type
from dcbi_dev
where event_class = 'events.quote.QuoteCreated' and quote_id = '57226f1e01a9c82283d02ff8'
)
select quote_id, b from q lateral view explode(split(substr(q.coverage_type,2,length(q.coverage_type) - 2),',')) exploded as b;
Created 06-02-2016 06:04 PM
I cannot even say how much this helps me. Thanks so much
Created 04-13-2020 04:56 AM
you can try
https://my.oschina.net/leejun2005/blog/120463
For example,
select a.timestamp, get_json_object(a.appevents, '$.eventid'), get_json_object(a.appenvets, '$.eventname') from log a;
should be changed to
select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。