Member since
04-27-2018
6
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5626 | 07-15-2018 07:27 AM |
10-11-2018
08:55 AM
1 Kudo
Hello
I have hdfs json files, specifically the files are tweets messages in json format, i'm trying to read that files with hive into a table.
i'm using the function get_json_object to extract each element of the json an insert in a table field.
Te problem is that i don't know how to manage struct fields, for example i have defined in th testination table a struct field with several fields for example, the field retweeted_status is a srutct field with two fields :
first i create a table to insert the tweets like a string:
CREATE EXTERNAL TABLE `tweets_raw`( `json_response` string)
then i create a second table where i want to split each json field in a table field. i created the next table:
CREATE TABLE tweets
(
in_reply_to_status_id_str string,
in_reply_to_status_id string,
created_at STRING,
in_reply_to_user_id_str STRING,
source string ,
retweeted_status STRUCT<
in_reply_to_status_id_str:STRING,
in_reply_to_status_id:STRING
>,
retweet_count STRING,
....
then, using the function get_json_object , i'm trying to insert data with a sentence like this:
INSERT overwrite table tweets SELECT cast(get_json_object(json_response, '$.in_reply_to_status_id_str') as STRING), cast(get_json_object(json_response, '$.in_reply_to_status_id') as STRING), cast(get_json_object(json_response, '$.created_at') as STRING), cast(get_json_object(json_response, '$.in_reply_to_user_id_str') as STRING), cast(get_json_object(json_response, '$.source') as STRING),
struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING), cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING) ) as retweeted_status , cast(get_json_object(json_response, '$.retweet_count') as STRING),
...
the simple fileds are ok, bur i don't know how insert the struct field retweeted_status , i try several things but none work, using
struct( cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id_str') as STRING), cast(get_json_object(json_response, '$.retweeted_status.in_reply_to_status_id') as STRING)
)
hive give me the next error:
Error while compiling statement: FAILED: SemanticException [Error 10044]: line 1:23 Cannot insert into target table because column number/types are different 'tweets': Cannot convert column 5 from struct<col1:string,col2:string> to struct<in_reply_to_status_id_str:string,in_reply_to_status_id:string>.
anyone know how to insert values in a struct field?
Thanks inadvance for your help
best regards
diego
... View more
Labels:
- Labels:
-
Apache Hive
-
HDFS
07-15-2018
07:27 AM
Hello Looking for hive i find the funcion conv('04191D0A035580', 16,10) that convert the string to string that can be casted to bigint, this function also work in impala, i will use it. thanks regards dcon.
... View more
07-15-2018
07:17 AM
Hello I have a table with a field of type string that is a hexadecimal number, like 04191D0A035580 04421352E14B80 040F644A385780 is there a way to convert this string values to bigint values like: 04191D0A035580 --> 1153512419579264 04421352E14B80 --> 1198550669151104 040F644A385780 --> 1142823323195264 thanks in advance for your help best regards dcon
... View more
Labels:
- Labels:
-
Apache Impala