Created 09-04-2018 03:25 PM
Hi Team,
I am trying to load below json data into hive table(using hdp 2.6.5 sandbox) ,however hive is only returning the first row , please suggest.
{"user":{"userlocation":"Cinderford, Gloucestershire","id":230231618,"name":"Aimee","screenname":"Aimee_Cottle","geoenabled":true},"tweetmessage":"Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover!","createddate":"2013-06-20T12:08:14","geolocation":null}
{"user":{"userlocation":"Garena ID : NuraBlazee","id":635239939,"name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":false},"tweetmessage":"RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É","createddate":"2013-06-20T12:08:16","geolocation":null}
{"user":{"userlocation":"Gemert,Netherlands","id":21418083,"name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":true},"tweetmessage":"? Listening to 'The Constant' by 'Anthrax' from 'Worship Music","createddate":"2013-06-20T12:08:20","geolocation":null}
Table structure-:
set hive.support.sql11.reserved.keywords=false;
CREATE EXTERNAL TABLE tweets ( user STRUCT <userlocation:STRING,id:INT,name:STRING,screenname:STRING,geoenabled:BOOLEAN>, tweetmessage STRING, createddate STRING, geolocation STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/hive/data/';
query-:
set hive.cli.print.header=true; hive> select user.screenname, user.name,tweetmessage,createddate from tweets ; OK
Aimee_Cottle Aimee Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I re cover! 2013-06-20T12:08:14
Kindly let me know , is there any way my query can fetch all the rows from the source
Thanks in Advance
Jatin
Created 09-05-2018 02:36 AM
Not able to recreate the same scenario on my end.
Here is what I have tried is:
Json file:
{"user":{"userlocation":"Cinderford, Gloucestershire","id":230231618,"name":"Aimee","screenname":"Aimee_Cottle","geoenabled":true},"tweetmessage":"Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover!","createddate":"2013-06-20T12:08:14","geolocation":null}
{"user":{"userlocation":"Garena ID : NuraBlazee","id":635239939,"name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":false},"tweetmessage":"RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É","createddate":"2013-06-20T12:08:16","geolocation":null}
{"user":{"userlocation":"Gemert,Netherlands","id":21418083,"name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":true},"tweetmessage":"? Listening to 'The Constant' by 'Anthrax' from 'Worship Music","createddate":"2013-06-20T12:08:20","geolocation":null}
HiveDDL:-
Create external table tweets( `user` struct<userlocation:string,id:string,name:string,screenname:string,geoenabled:string>,tweetmessage string,createddate string,geolocation string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/hive/data/';
hive> select * from tweets;
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+
| user | tweetmessage | createddate | geolocation |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+
| {"userlocation":"Cinderford, Gloucestershire","id":"230231618","name":"Aimee","screenname":"Aimee_Cottle","geoenabled":"true"} | Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover! | 2013-06-20T12:08:14 | NULL |
| {"userlocation":"Garena ID : NuraBlazee","id":"635239939","name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":"false"} | RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É | 2013-06-20T12:08:16 | NULL |
| {"userlocation":"Gemert,Netherlands","id":"21418083","name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":"true"} | ? Listening to 'The Constant' by 'Anthrax' from 'Worship Music | 2013-06-20T12:08:20 | NULL |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+
Selecting specific field:
select `user`.userlocation from tweets; +------------------------------+--+ | userlocation | +------------------------------+--+ | Cinderford, Gloucestershire | | Garena ID : NuraBlazee | | Gemert,Netherlands | +------------------------------+--+
As I'm able to get all the three rows data, Once try with the above hive ddl statement and check are u able to get the data or not.
-
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 09-05-2018 03:00 AM
Hi Shu,
Thank you for your reply , I have copy and pasted your inputs and ddls ,however I am still getting the same 1 row count. I am using the HDP 2.6.5 sandbox.
root@sandbox-hdp data]# sandbox-version == Sandbox Information == Platform: hdp-security Build date: 06-18-2018 Ambari version: 2.6.2.0-155 Hadoop version: Hadoop 2.7.3.2.6.5.0-292 OS: CentOS Linux release 7.5.1804 (Core)
I have no clue ,why this is happening.
Created 09-05-2018 03:04 AM
@shu
Are you using the same sandbox as me ?