Created 01-12-2018 04:12 PM
Hello friends,
I'm working with a Hive table which fetches twitter data from flume / oozie.
The problem is that Hive is truncating the tweet text field...
Can anybody please help me solving this issue?
Here's the table:
CREATE EXTERNAL TABLE tweets ( id bigint, created_at string, source STRING, favorited BOOLEAN, retweeted_status STRUCT< text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, lang string, retweet_count int, text string, user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING> ) PARTITIONED BY (datehour int) LOCATION 'hdfs://192.168.1.11:8020/user/flume/tweets'
Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂
Created 01-12-2018 04:30 PM
Hi, may be could you try `text_general` instead of `string` for tweets. I did it with Solr and it's wor
Created 01-12-2018 05:40 PM
Many thanks for your quick answer.
Unfortunately that's not a datatype that hiveql recognizes... 😞
Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂