Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive insert into struct fields get_json_object

avatar
Explorer

Hello

I have hdfs json files, specifically the files are tweets messages in json format, i'm trying to read that files with hive into a table.

i'm using the function get_json_object to extract each element of the json an insert in a table field.

Te problem is that i don't know how to manage struct fields, for example i have defined in th testination table a struct field with several fields for example, the field retweeted_status  is a srutct field with two fields :

 

first i create a table to insert the tweets like a string:

CREATE EXTERNAL TABLE `tweets_raw`(
`json_response` string)

 

then i create a second table where i want to split each json field in a table field. i created the next table:

 

CREATE TABLE tweets

  in_reply_to_status_id_str string,

  in_reply_to_status_id string,

  created_at STRING,

  in_reply_to_user_id_str STRING,

  source string ,

  retweeted_status STRUCT<

    in_reply_to_status_id_str:STRING,

    in_reply_to_status_id:STRING

 

    >,

  retweet_count STRING,

....

 

then, using the function get_json_object , i'm trying to insert data with a sentence like this:

INSERT overwrite table tweets
SELECT
cast(get_json_object(json_response, '$.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_status_id') as STRING),
cast(get_json_object(json_response, '$.created_at') as STRING),
cast(get_json_object(json_response, '$.in_reply_to_user_id_str') as STRING),
cast(get_json_object(json_response, '$.source') as STRING),


struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)


) as retweeted_status , 
cast(get_json_object(json_response, '$.retweet_count') as STRING),

...

the simple fileds are ok, bur i don't know how insert the struct field retweeted_status , i try several things but none work, using

struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING),
          cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)

         )

 

hive give me the next error:

  • Error while compiling statement: FAILED: SemanticException [Error 10044]: line 1:23 Cannot insert into target table because column number/types are different 'tweets': Cannot convert column 5 from struct<col1:string,col2:string> to struct<in_reply_to_status_id_str:string,in_reply_to_status_id:string>.
 
anyone know how to insert values in a struct field?
Thanks inadvance for your help
best regards
diego

 

3 REPLIES 3

avatar
New Contributor

i'm also facing same issue:

FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'extract': Cannot convert column 9 from struct<col1:bigint> to struct<n:bigint>.

 

@dcond, Did you find any solution?

avatar
New Contributor

Did you find any solution? Can you share that?😁

avatar
Cloudera Employee

Can you provide an example with a single field and a reproducible error case?