Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive is truncating strings :(

avatar
Rising Star

Hello friends,
I'm working with a Hive table which fetches twitter data from flume / oozie.
The problem is that Hive is truncating the tweet text field...
Can anybody please help me solving this issue?

Here's the table:

CREATE EXTERNAL TABLE tweets (
  id bigint, 
  created_at string,
  source STRING,
   favorited BOOLEAN,
   retweeted_status STRUCT<
     text:STRING,
     user:STRUCT<screen_name:STRING,name:STRING>,
     retweet_count:INT>,
   entities STRUCT<
     urls:ARRAY<STRUCT<expanded_url:STRING>>,
     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
     hashtags:ARRAY<STRUCT<text:STRING>>>,
  lang string,
  retweet_count int,
  text string,
  user STRUCT<
     screen_name:STRING,
     name:STRING,
     friends_count:INT,
     followers_count:INT,
     statuses_count:INT,
     verified:BOOLEAN,
     utc_offset:INT,
     time_zone:STRING>
       )
PARTITIONED BY (datehour int)
LOCATION
  'hdfs://192.168.1.11:8020/user/flume/tweets'
1 ACCEPTED SOLUTION

avatar
Rising Star

It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.

Problem solved 🙂

View solution in original post

3 REPLIES 3

avatar
New Member

Hi, may be could you try `text_general` instead of `string` for tweets. I did it with Solr and it's wor

avatar
Rising Star

@PY Paul-Arnaud,

Many thanks for your quick answer.
Unfortunately that's not a datatype that hiveql recognizes... 😞

avatar
Rising Star

It was actually a problem in the twitter JSON.
When we get a tweet wich is actually a retweet, flume truncates it.

Problem solved 🙂