Reply
Explorer
Posts: 6
Registered: ‎04-27-2018

hive insert into struct fields get_json_object

[ Edited ]

Hello

I have hdfs json files, specifically the files are tweets messages in json format, i'm trying to read that files with hive into a table.

i'm using the function get_json_object to extract each element of the json an insert in a table field.

Te problem is that i don't know how to manage struct fields, for example i have defined in th testination table a struct field with several fields for example, the field retweeted_status  is a srutct field with two fields :

 

first i create a table to insert the tweets like a string:

CREATE EXTERNAL TABLE `tweets_raw`(
`json_response` string)

 

then i create a second table where i want to split each json field in a table field. i created the next table:

 

CREATE TABLE tweets

  in_reply_to_status_id_str string,

  in_reply_to_status_id string,

  created_at STRING,

  in_reply_to_user_id_str STRING,

  source string ,

  retweeted_status STRUCT<

    in_reply_to_status_id_str:STRING,

    in_reply_to_status_id:STRING

 

    >,

  retweet_count STRING,

....

 

then, using the function get_json_object , i'm trying to insert data with a sentence like this:

INSERT overwrite table tweets
SELECT
cast(get_json_object(json_response, '$.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_status_id') as STRING),
cast(get_json_object(json_response, '$.created_at') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_user_id_str') as STRING),
cast(get_json_object(json_response, '$.source') as STRING),


struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)


) as retweeted_status , 
cast(get_json_object(json_response, '$.retweet_count') as STRING),

...

the simple fileds are ok, bur i don't know how insert the struct field retweeted_status , i try several things but none work, using

struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
          cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)

         )

 

hive give me the next error:

  • Error while compiling statement: FAILED: SemanticException [Error 10044]: line 1:23 Cannot insert into target table because column number/types are different 'tweets': Cannot convert column 5 from struct<col1:string,col2:string> to struct<in_reply_to_status_id_str:string,in_reply_to_status_id:string>.
 
anyone know how to insert values in a struct field?
Thanks inadvance for your help
best regards
diego

 

Announcements