Created 02-02-2017 10:47 AM
I have a Hive table tweets stored as text that I am trying to write to another table tweetsORC that is ORC. Both have the same structure:
col_name data_type comment racist boolean from deserializer contributors string from deserializer coordinates string from deserializer created_at string from deserializer entities struct<hashtags:array<string>,symbols:array<string>,urls:array<struct<display_url:string,expanded_url:string,indices:array<tinyint>,url:string>>,user_mentions:array<string>> from deserializer favorite_count tinyint from deserializer favorited boolean from deserializer filter_level string from deserializer geo string from deserializer id bigint from deserializer id_str string from deserializer in_reply_to_screen_name string from deserializer in_reply_to_status_id string from deserializer in_reply_to_status_id_str string from deserializer in_reply_to_user_id string from deserializer in_reply_to_user_id_str string from deserializer is_quote_status boolean from deserializer lang string from deserializer place string from deserializer possibly_sensitive boolean from deserializer retweet_count tinyint from deserializer retweeted boolean from deserializer source string from deserializer text string from deserializer timestamp_ms string from deserializer truncated boolean from deserializer user struct<contributors_enabled:boolean,created_at:string,default_profile:boolean,default_profile_image:boolean,description:string,favourites_count:tinyint,follow_request_sent:string,followers_count:tinyint,following:string,friends_count:tinyint,geo_enabled:boolean,id:bigint,id_str:string,is_translator:boolean,lang:string,listed_count:tinyint,location:string,name:string,notifications:string,profile_background_color:string,profile_background_image_url:string,profile_background_image_url_https:string,profile_background_tile:boolean,profile_image_url:string,profile_image_url_https:string,profile_link_color:string,profile_sidebar_border_color:string,profile_sidebar_fill_color:string,profile_text_color:string,profile_use_background_image:boolean,protected:boolean,screen_name:string,statuses_count:smallint,time_zone:string,url:string,utc_offset:string,verified:boolean> from deserializer
When I try to insert from tweets to tweetsORC I get:
INSERT OVERWRITE TABLE tweetsORC SELECT * FROM tweets; FAILED: NoMatchingMethodException No matching method for class org.apache.hadoop.hive.ql.udf.UDFToString with (struct<hashtags:array<string>,symbols:array<string>,urls:array<struct<display_url:string,expanded_url:string,indices:array<tinyint>,url:string>>,user_mentions:array<string>>). Possible choices: _FUNC_(bigint) _FUNC_(binary) _FUNC_(boolean) _FUNC_(date) _FUNC_(decimal(38,18)) _FUNC_(double) _FUNC_(float) _FUNC_(int) _FUNC_(smallint) _FUNC_(string) _FUNC_(timestamp) _FUNC_(tinyint) _FUNC_(void)
The only help I have found on this kind of problem says to make a UDF use primitive types, but I am not using a UDF! Any help is much appreciated!
FYI: Hive version:
Hive 1.2.1000.2.4.2.0-258 Subversion git://u12-slave-5708dfcd-10/grid/0/jenkins/workspace/HDP-build-ubuntu12/bigtop/output/hive/hive-1.2.1000.2.4.2.0 -r 240760457150036e13035cbb82bcda0c65362f3a
Created 02-08-2017 02:49 PM
If I remove the * and instead do the following I get a different error:
INSERT OVERWRITE TABLE tweetsORC SELECT racist, contributors, coordinates, created_at, entities, favorite_count, favorited, filter_level, geo, id,id_str, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_status_id_str, in_reply_to_user_id, in_reply_to_user_id_str, is_quote_status, lang, place, possibly_sensitive, retweet_count, retweeted, source, text, timestamp_ms, truncated, userFROM tweets;
Created 02-08-2017 02:49 PM
If I remove the * and instead do the following I get a different error:
INSERT OVERWRITE TABLE tweetsORC SELECT racist, contributors, coordinates, created_at, entities, favorite_count, favorited, filter_level, geo, id,id_str, in_reply_to_screen_name, in_reply_to_status_id, in_reply_to_status_id_str, in_reply_to_user_id, in_reply_to_user_id_str, is_quote_status, lang, place, possibly_sensitive, retweet_count, retweeted, source, text, timestamp_ms, truncated, userFROM tweets;
Created 02-09-2017 06:46 PM
How is this an answer to your question ?