Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive is unable to read multiple json data

avatar
Explorer

Hi Team,

I am trying to load below json data into hive table(using hdp 2.6.5 sandbox) ,however hive is only returning the first row , please suggest.

{"user":{"userlocation":"Cinderford, Gloucestershire","id":230231618,"name":"Aimee","screenname":"Aimee_Cottle","geoenabled":true},"tweetmessage":"Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover!","createddate":"2013-06-20T12:08:14","geolocation":null}

{"user":{"userlocation":"Garena ID : NuraBlazee","id":635239939,"name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":false},"tweetmessage":"RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É","createddate":"2013-06-20T12:08:16","geolocation":null}

{"user":{"userlocation":"Gemert,Netherlands","id":21418083,"name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":true},"tweetmessage":"? Listening to 'The Constant' by 'Anthrax' from 'Worship Music","createddate":"2013-06-20T12:08:20","geolocation":null}

Table structure-:

set hive.support.sql11.reserved.keywords=false;

CREATE EXTERNAL TABLE tweets ( user STRUCT <userlocation:STRING,id:INT,name:STRING,screenname:STRING,geoenabled:BOOLEAN>, tweetmessage STRING, createddate STRING, geolocation STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/hive/data/';

query-:

set hive.cli.print.header=true; hive> select user.screenname, user.name,tweetmessage,createddate from tweets ; OK

Aimee_Cottle Aimee Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I re cover! 2013-06-20T12:08:14

Kindly let me know , is there any way my query can fetch all the rows from the source

Thanks in Advance

Jatin

3 REPLIES 3

avatar
Master Guru

@Jatinmaya Choudhury

Not able to recreate the same scenario on my end.

Here is what I have tried is:

Json file:

{"user":{"userlocation":"Cinderford, Gloucestershire","id":230231618,"name":"Aimee","screenname":"Aimee_Cottle","geoenabled":true},"tweetmessage":"Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover!","createddate":"2013-06-20T12:08:14","geolocation":null}
{"user":{"userlocation":"Garena ID : NuraBlazee","id":635239939,"name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":false},"tweetmessage":"RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É","createddate":"2013-06-20T12:08:16","geolocation":null}
{"user":{"userlocation":"Gemert,Netherlands","id":21418083,"name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":true},"tweetmessage":"? Listening to 'The Constant' by 'Anthrax' from 'Worship Music","createddate":"2013-06-20T12:08:20","geolocation":null}

HiveDDL:-

Create external table tweets(
`user` struct<userlocation:string,id:string,name:string,screenname:string,geoenabled:string>,tweetmessage string,createddate string,geolocation string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/hive/data/';
hive> select * from tweets;
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+
|                                                               user                                                               |                                                                 tweetmessage                                                                  |     createddate      | geolocation  |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+
| {"userlocation":"Cinderford, Gloucestershire","id":"230231618","name":"Aimee","screenname":"Aimee_Cottle","geoenabled":"true"}   | Gastroenteritis has pretty much killed me this week 😞 off work for a few days whilst I recover!                                              | 2013-06-20T12:08:14  | NULL         |
| {"userlocation":"Garena ID : NuraBlazee","id":"635239939","name":"Axyraf.","screenname":"Asyraf_Fauzi","geoenabled":"false"}     | RT @abhigyantweets: Can't stop a natural disaster but so many lives in U'khand wouldn't have been lost if there was disaster preparedness. É  | 2013-06-20T12:08:16  | NULL         |
| {"userlocation":"Gemert,Netherlands","id":"21418083","name":"Ad van Steenbruggen","screenname":"torment00","geoenabled":"true"}  | ? Listening to 'The Constant' by 'Anthrax' from 'Worship Music                                                                                | 2013-06-20T12:08:20  | NULL         |
+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+--------------+--+

Selecting specific field:

select `user`.userlocation from tweets;
+------------------------------+--+
|         userlocation         |
+------------------------------+--+
| Cinderford, Gloucestershire  |
| Garena ID : NuraBlazee       |
| Gemert,Netherlands           |
+------------------------------+--+

As I'm able to get all the three rows data, Once try with the above hive ddl statement and check are u able to get the data or not.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Explorer

Hi Shu,

Thank you for your reply , I have copy and pasted your inputs and ddls ,however I am still getting the same 1 row count. I am using the HDP 2.6.5 sandbox.

root@sandbox-hdp data]# sandbox-version                                                                          
== Sandbox Information ==                                                                                         
Platform: hdp-security                                                                                            
Build date: 06-18-2018                                                                                            
Ambari version: 2.6.2.0-155                                                                                       
Hadoop version: Hadoop 2.7.3.2.6.5.0-292                                                                          
OS: CentOS Linux release 7.5.1804 (Core)                                                                          

I have no clue ,why this is happening.

avatar
Explorer

@shu

Are you using the same sandbox as me ?