Created 01-12-2018 04:12 PM
Hello friends,
I'm working with a Hive table which fetches twitter data from flume / oozie.
The problem is that Hive is truncating the tweet text field...
Can anybody please help me solving this issue?
Here's the table:
CREATE EXTERNAL TABLE tweets (
id bigint,
created_at string,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
lang string,
retweet_count int,
text string,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>
)
PARTITIONED BY (datehour int)
LOCATION
'hdfs://192.168.1.11:8020/user/flume/tweets'
Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂
Created 01-12-2018 04:30 PM
Hi, may be could you try `text_general` instead of `string` for tweets. I did it with Solr and it's wor
Created 01-12-2018 05:40 PM
Many thanks for your quick answer.
Unfortunately that's not a datatype that hiveql recognizes... 😞
Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂