Support Questions

Find answers, ask questions, and share your expertise

Hive is truncating strings :(

avatar
Rising Star

Hello friends,
I'm working with a Hive table which fetches twitter data from flume / oozie.
The problem is that Hive is truncating the tweet text field...
Can anybody please help me solving this issue?

Here's the table:

CREATE EXTERNAL TABLE tweets (
  id bigint, 
  created_at string,
  source STRING,
   favorited BOOLEAN,
   retweeted_status STRUCT<
     text:STRING,
     user:STRUCT<screen_name:STRING,name:STRING>,
     retweet_count:INT>,
   entities STRUCT<
     urls:ARRAY<STRUCT<expanded_url:STRING>>,
     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
     hashtags:ARRAY<STRUCT<text:STRING>>>,
  lang string,
  retweet_count int,
  text string,
  user STRUCT<
     screen_name:STRING,
     name:STRING,
     friends_count:INT,
     followers_count:INT,
     statuses_count:INT,
     verified:BOOLEAN,
     utc_offset:INT,
     time_zone:STRING>
       )
PARTITIONED BY (datehour int)
LOCATION
  'hdfs://192.168.1.11:8020/user/flume/tweets'
1 ACCEPTED SOLUTION

avatar
Rising Star

It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.

Problem solved 🙂

View solution in original post

3 REPLIES 3

avatar
New Contributor

Hi, may be could you try `text_general` instead of `string` for tweets. I did it with Solr and it's wor

avatar
Rising Star

@PY Paul-Arnaud,

Many thanks for your quick answer.
Unfortunately that's not a datatype that hiveql recognizes... 😞

avatar
Rising Star

It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.

Problem solved 🙂