Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

create hive table with this json format

Master Collaborator

I want to load a json record into hive

{"id":"790657073453424645","user_friends_count":{"int":121},"user_location":{"string":"Europa"},"user_description":{"string":"MITMACHEN \r\n\r\nIm Kampf gegen die EntDemokratisierung durch Freihandelsabkommen! \r\n\r\nStop TTIP - Stop TAFTA\r\n\r\nThe Fight against USA TTIP !"},"user_statuses_count":{"int":7561},"user_followers_count":{"int":1380},"user_name":{"string":"Freihandelsabkommen"},"user_screen_name":{"string":"Stop_TTIP"},"created_at":{"string":"2016-10-24T16:51:50Z"},"text":{"string":"RT @alikonkret: Da wurde die Meinungsmache im Kommentar versteckt um die scheinbare Neutralität zu wahren. #CETA #Wallonia https://t.co/ViA…"},"retweet_count":{"long":0},"retweeted":{"boolean":true},"in_reply_to_user_id":{"long":-1},"source":{"string":"<a href=\"http://www.tweetcaster.com\" rel=\"nofollow\">TweetCaster for Android</a>"},"in_reply_to_status_id":{"long":-1},"media_url_https":null,"expanded_url":null}

I only have the skeleton command

CREATE EXTERNAL TABLE tweetdata3(  
) ROW FORMAT DELIMITED Fields terminated by ',' STORED as textfile  location '/user/flume/tweets';
6 REPLIES 6

New Contributor

You would need the JSON SerDe driver for Hive in order to make your JSON data to Hive tables. The module is available below: https://github.com/rcongiu/Hive-JSON-Serde

Master Collaborator

followed the steps but getting all NULLS .

I compiled the serde and copied the json-serde-1.3.8-SNAPSHOT.jar file to the $FLUME_HOME/lib folder.

hive> CREATE EXTERNAL TABLE tweetdata3 (
    >   id string,
    >   person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
    > ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    > LOCATION '/user/flume/tweets';
OK
Time taken: 0.197 seconds
hive> desc tweetdata3;
OK
id                      string                  from deserializer
person                  struct<email:string,first_name:string,last_name:string,location:struct<address:string,city:string,state:string,zipcode:string>,text:string,url:string>  from deserializer
Time taken: 0.266 seconds, Fetched: 2 row(s)
hive>
    > SELECT id, person.first_name, person.last_name, person.email,
    > person.location.address, person.location.city, person.location.state,
    > person.location.zipcode, person.text, person.url
    > FROM tweetdata3 LIMIT 5;
OK
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.282 seconds, Fetched: 2 row(s)
hive> SELECT id, person.first_name, person.last_name, person.email,
    > person.location.address, person.location.city, person.location.state,
    > person.location.zipcode, person.text, person.url
    > FROM tweetdata3 LIMIT 5;
OK
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
790657073453424645      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.063 seconds, Fetched: 2 row(s)

Master Collaborator

I am confused , shouldn't the twitter data be same for everyone ? I am looking at the links you have mentioned here and the twitter data is different everywhere ?

looking at my data above please advise if its the right twitter record and if not why I am getting this format ?

New Contributor

New Contributor
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.