Created on 10-11-2018 08:55 AM - edited 09-16-2022 06:48 AM
Hello
I have hdfs json files, specifically the files are tweets messages in json format, i'm trying to read that files with hive into a table.
i'm using the function get_json_object to extract each element of the json an insert in a table field.
Te problem is that i don't know how to manage struct fields, for example i have defined in th testination table a struct field with several fields for example, the field retweeted_status is a srutct field with two fields :
first i create a table to insert the tweets like a string:
CREATE EXTERNAL TABLE `tweets_raw`(
`json_response` string)
then i create a second table where i want to split each json field in a table field. i created the next table:
CREATE TABLE tweets
(
in_reply_to_status_id_str string,
in_reply_to_status_id string,
created_at STRING,
in_reply_to_user_id_str STRING,
source string ,
retweeted_status STRUCT<
in_reply_to_status_id_str:STRING,
in_reply_to_status_id:STRING
>,
retweet_count STRING,
....
then, using the function get_json_object , i'm trying to insert data with a sentence like this:
INSERT overwrite table tweets
SELECT
cast(get_json_object(json_response, '$.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_status_id') as STRING),
cast(get_json_object(json_response, '$.created_at') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_user_id_str') as STRING),
cast(get_json_object(json_response, '$.source') as STRING),
struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)
) as retweeted_status ,
cast(get_json_object(json_response, '$.retweet_count') as STRING),
...
the simple fileds are ok, bur i don't know how insert the struct field retweeted_status , i try several things but none work, using
struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)
)
hive give me the next error:
Created on 04-24-2019 11:45 PM - edited 04-24-2019 11:50 PM
i'm also facing same issue:
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'extract': Cannot convert column 9 from struct<col1:bigint> to struct<n:bigint>.
@dcond, Did you find any solution?
Created 08-16-2022 01:26 AM
Did you find any solution? Can you share that?😁
Created 08-22-2022 02:40 PM
Can you provide an example with a single field and a reproducible error case?