Created 01-12-2018 04:12 PM
Hello friends,
I'm working with a Hive table which fetches twitter data from flume / oozie.
The problem is that Hive is truncating the tweet text field... 
Can anybody please help me solving this issue?
Here's the table:
CREATE EXTERNAL TABLE tweets (
  id bigint, 
  created_at string,
  source STRING,
   favorited BOOLEAN,
   retweeted_status STRUCT<
     text:STRING,
     user:STRUCT<screen_name:STRING,name:STRING>,
     retweet_count:INT>,
   entities STRUCT<
     urls:ARRAY<STRUCT<expanded_url:STRING>>,
     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
     hashtags:ARRAY<STRUCT<text:STRING>>>,
  lang string,
  retweet_count int,
  text string,
  user STRUCT<
     screen_name:STRING,
     name:STRING,
     friends_count:INT,
     followers_count:INT,
     statuses_count:INT,
     verified:BOOLEAN,
     utc_offset:INT,
     time_zone:STRING>
       )
PARTITIONED BY (datehour int)
LOCATION
  'hdfs://192.168.1.11:8020/user/flume/tweets'
					
				
			
			
				
			
			
			
			
			
			
			
		Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂
Created 01-12-2018 04:30 PM
Hi, may be could you try `text_general` instead of `string` for tweets. I did it with Solr and it's wor
Created 01-12-2018 05:40 PM
Many thanks for your quick answer. 
Unfortunately that's not a datatype that hiveql recognizes... 😞
Created 01-22-2018 10:18 AM
It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.
Problem solved 🙂