Created 11-02-2016 09:14 PM
I want to load a json record into hive
{"id":"790657073453424645","user_friends_count":{"int":121},"user_location":{"string":"Europa"},"user_description":{"string":"MITMACHEN \r\n\r\nIm Kampf gegen die EntDemokratisierung durch Freihandelsabkommen! \r\n\r\nStop TTIP - Stop TAFTA\r\n\r\nThe Fight against USA TTIP !"},"user_statuses_count":{"int":7561},"user_followers_count":{"int":1380},"user_name":{"string":"Freihandelsabkommen"},"user_screen_name":{"string":"Stop_TTIP"},"created_at":{"string":"2016-10-24T16:51:50Z"},"text":{"string":"RT @alikonkret: Da wurde die Meinungsmache im Kommentar versteckt um die scheinbare Neutralität zu wahren. #CETA #Wallonia https://t.co/ViA…"},"retweet_count":{"long":0},"retweeted":{"boolean":true},"in_reply_to_user_id":{"long":-1},"source":{"string":"<a href=\"http://www.tweetcaster.com\" rel=\"nofollow\">TweetCaster for Android</a>"},"in_reply_to_status_id":{"long":-1},"media_url_https":null,"expanded_url":null}
I only have the skeleton command
CREATE EXTERNAL TABLE tweetdata3( ) ROW FORMAT DELIMITED Fields terminated by ',' STORED as textfile location '/user/flume/tweets';
Created 11-02-2016 11:33 PM
You would need the JSON SerDe driver for Hive in order to make your JSON data to Hive tables. The module is available below: https://github.com/rcongiu/Hive-JSON-Serde
Created 11-03-2016 01:58 PM
followed the steps but getting all NULLS .
I compiled the serde and copied the json-serde-1.3.8-SNAPSHOT.jar file to the $FLUME_HOME/lib folder.
hive> CREATE EXTERNAL TABLE tweetdata3 (
> id string,
> person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> LOCATION '/user/flume/tweets';
OK
Time taken: 0.197 seconds
hive> desc tweetdata3;
OK
id string from deserializer
person struct<email:string,first_name:string,last_name:string,location:struct<address:string,city:string,state:string,zipcode:string>,text:string,url:string> from deserializer
Time taken: 0.266 seconds, Fetched: 2 row(s)
hive>
> SELECT id, person.first_name, person.last_name, person.email,
> person.location.address, person.location.city, person.location.state,
> person.location.zipcode, person.text, person.url
> FROM tweetdata3 LIMIT 5;
OK
790657073453424645 NULL NULL NULL NULL NULL NULL NULL NULL NULL
790657073453424645 NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.282 seconds, Fetched: 2 row(s)
hive> SELECT id, person.first_name, person.last_name, person.email,
> person.location.address, person.location.city, person.location.state,
> person.location.zipcode, person.text, person.url
> FROM tweetdata3 LIMIT 5;
OK
790657073453424645 NULL NULL NULL NULL NULL NULL NULL NULL NULL
790657073453424645 NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 0.063 seconds, Fetched: 2 row(s)
Created 11-03-2016 12:03 AM
Created 11-03-2016 02:12 PM
I am confused , shouldn't the twitter data be same for everyone ? I am looking at the links you have mentioned here and the twitter data is different everywhere ?
looking at my data above please advise if its the right twitter record and if not why I am getting this format ?
Created 01-14-2019 11:00 AM
check out this article
https://medium.com/datadriveninvestor/analyzing-twitter-feeds-using-hive-7e074025f295
Created 01-14-2019 03:30 PM
Check out this article , to map JSON to hive columns
https://medium.com/datadriveninvestor/analyzing-twitter-feeds-using-hive-7e074025f295