Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

create hive table with this json format

avatar
Super Collaborator

I want to load a json record into hive

{"id":"790657073453424645","user_friends_count":{"int":121},"user_location":{"string":"Europa"},"user_description":{"string":"MITMACHEN \r\n\r\nIm Kampf gegen die EntDemokratisierung durch Freihandelsabkommen! \r\n\r\nStop TTIP - Stop TAFTA\r\n\r\nThe Fight against USA TTIP !"},"user_statuses_count":{"int":7561},"user_followers_count":{"int":1380},"user_name":{"string":"Freihandelsabkommen"},"user_screen_name":{"string":"Stop_TTIP"},"created_at":{"string":"2016-10-24T16:51:50Z"},"text":{"string":"RT @alikonkret: Da wurde die Meinungsmache im Kommentar versteckt um die scheinbare Neutralität zu wahren. #CETA #Wallonia https://t.co/ViA…"},"retweet_count":{"long":0},"retweeted":{"boolean":true},"in_reply_to_user_id":{"long":-1},"source":{"string":"<a href=\"http://www.tweetcaster.com\" rel=\"nofollow\">TweetCaster for Android</a>"},"in_reply_to_status_id":{"long":-1},"media_url_https":null,"expanded_url":null}

I only have the skeleton command

CREATE EXTERNAL TABLE tweetdata3(  
) ROW FORMAT DELIMITED Fields terminated by ',' STORED as textfile  location '/user/flume/tweets';
6 REPLIES 6

avatar
New Contributor

You would need the JSON SerDe driver for Hive in order to make your JSON data to Hive tables. The module is available below: https://github.com/rcongiu/Hive-JSON-Serde

avatar
Super Collaborator

followed the steps but getting all NULLS .

I compiled the serde and copied the json-serde-1.3.8-SNAPSHOT.jar file to the $FLUME_HOME/lib folder.

hive> CREATE EXTERNAL TABLE tweetdata3 (
    >   id string,
    >   person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
    > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    > LOCATION '/user/flume/tweets';
OK
Time taken: 0.197 seconds
hive> desc tweetdata3;
OK
id                      string                  from deserializer
person                  struct<email:string,first_name:string,last_name:string,location:struct<address:string,city:string,state:string,zipcode:string>,text:string,url:string>  from deserializer
Time taken: 0.266 seconds, Fetched: 2 row(s)
hive>
    > SELECT id, person.first_name, person.last_name, person.email,
    > person.location.address, person.location.city, person.location.state,
    > person.location.zipcode, person.text, person.url
    > FROM tweetdata3 LIMIT 5;
OK
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.282 seconds, Fetched: 2 row(s)
hive> SELECT id, person.first_name, person.last_name, person.email,
    > person.location.address, person.location.city, person.location.state,
    > person.location.zipcode, person.text, person.url
    > FROM tweetdata3 LIMIT 5;
OK
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.063 seconds, Fetched: 2 row(s)

avatar
Super Collaborator

I am confused , shouldn't the twitter data be same for everyone ? I am looking at the links you have mentioned here and the twitter data is different everywhere ?

looking at my data above please advise if its the right twitter record and if not why I am getting this format ?

avatar
New Contributor

avatar
New Contributor