Support Questions

Find answers, ask questions, and share your expertise

Error while creating hive table on top of json data

avatar
Contributor

Hi ,

I am trying to create a hive table on top of son data file but getting the below error.Below are my json file,hive ddl and error.

Json file::::::::

{ "purchaseid": { "ticketnumber": "23546852222", "location": "vizag", "Travelerhistory": { "trav": { "fname": "ramu", "lname": "gogi", "travelingarea": { "destination": { "stationid": "KAJKL", "stationname": "hyd" } }, "food": { "foodpref": [{ "foodcode": "CK567", "foodcodeSegment": "NOVEG" }, { "foodcode": "MM98", "foodcodeSegment": "VEG" } ] } } } } }

Hive DDL::

add jar /home/**********/json-serde-1.3.7-jar-with-dependencies.jar; CREATE external TABLE ds1414( ticketnumber string, location string, Travelerhistory ARRAY<struct<trav :struct<fname:string,lname:string, travelingarea :ARRAY<struct<destination :struct<stationid:string,stationname:string>>>, food :ARRAY<struct<foodpref :struct<foodcode:string,foodcodeSegment:string>>> >>> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/***********/json2/' ;

Error:::

select * from ds1414;

OK

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream@205df5dc; line: 1, column: 0]) at [Source: java.io.ByteArrayInputStream@205df5dc; line: 1, column: 3] Time taken: 0.212 seconds

8 REPLIES 8

avatar
Master Guru

@srini

Try with the below create table statement

hive> Create table ds1414(
purchaseid struct<ticketnumber:string,location:string,
  Travelerhistory:struct<
    trav:struct<fname:string,lname:string,
        travelingarea:struct< destination :struct<stationid:string,stationname:string>>,
    food :struct<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>
    >
  >
>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/***********/json2/';

Examples of Accessing data from hive table as follows

hive> select * from ds1414;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                                                                                            ds1414.purchaseid                                                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| {"ticketnumber":"23546852222","location":"vizag","travelerhistory":{"trav":{"fname":"ramu","lname":"gogi","travelingarea":{"destination":{"stationid":"KAJKL","stationname":"hyd"}},"food":{"foodpref":[{"foodcode":"CK567","foodcodesegment":"NOVEG"},{"foodcode":"MM98","foodcodesegment":"VEG"}]}}}}  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
hive> select purchaseid.ticketnumber from ds1414;
+---------------+--+
| ticketnumber  |
+---------------+--+
| 23546852222   |
+---------------+--+
hive> select ds1414.purchaseid.ticketnumber,purchaseid.location from ds1414;
+---------------+-----------+--+
| ticketnumber  | location  |
+---------------+-----------+--+
| 23546852222   | vizag     |
+---------------+-----------+--+
hive>  select purchaseid.Travelerhistory.trav.food.foodpref from ds1414;
+-----------------------------------------------------------------------------------------------+--+
|                                           foodpref                                            |
+-----------------------------------------------------------------------------------------------+--+
| [{"foodcode":"CK567","foodcodesegment":"NOVEG"},{"foodcode":"MM98","foodcodesegment":"VEG"}]  |
+-----------------------------------------------------------------------------------------------+--+
hive> select purchaseid.Travelerhistory.trav.food.foodpref.foodcode from ds1414;
+-------------------+--+
|     foodcode      |
+-------------------+--+
| ["CK567","MM98"]  |
+-------------------+--+
hive> select purchaseid.Travelerhistory.trav.food.foodpref.foodcode[0] from ds1414;
+--------+--+
|  _c0   |
+--------+--+
| CK567  |
+--------+--+

Let us know if you are having any questions..

avatar
Master Guru

@srini

Could you make sure you are having valid json message in "/user/***********/json2/" this path and keep all the json message in one line(no new line)

(or)

just create a new file in your local and then put that file into your hadoop directory

bash$ vi sample_json.txt //create a new file with the below sample message in it.<br>{ "purchaseid": { "ticketnumber": "23546852222", "location": "vizag", "Travelerhistory": { "trav": { "fname": "ramu", "lname": "gogi", "travelingarea": { "destination": { "stationid": "KAJKL", "stationname": "hyd" } }, "food": { "foodpref": [{ "foodcode": "CK567", "foodcodeSegment": "NOVEG" }, { "foodcode": "MM98", "foodcodeSegment": "VEG" } ] } } } } }
bash$ hadoop fs -rm -r /user/***********/json2/ //for testing purposes we are deleting all the files in the dirbash$ hadoop fs -put -f sample_json.txt /user/***********/json2/ //put the newly created file into the hdfs dir

-

if you are still having issue attach screenshot of your input data in /user/***********/json2/ this path.

avatar
Master Guru
@srini

Please use the below DDL

Create EXTERNAL TABLE SRGMSBI1417.json14_07_05_01( 
purchaseid struct
<ticid:string,ticlocation:string,custnum :string, 
Travleinfo:struct
<Trav:struct
<fname:string,mname:string,
freq:struct
<fre:struct
<frequencynumber:string,frequnecystatus:string>>, 
food :struct
<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>, 
Seats :struct<Seat:array<struct<seatberth:string,loc:string>>>, stations :struct<station:array<struct<sationname:string,trainnum:string>>>>>, 
Comments :struct<Comment:array<struct<commentno:string,desc:string,passengerseat :struct<intele :string>,passengerloc :struct<intele :string>>>>>) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/user/srgmsbi1417/json14_07_05_01';

i have tested some columns

hive> select purchaseid.ticid from default.json limit 10;
+--------+--+
| ticid  |
+--------+--+
| 1496   |
+--------+--+
hive> select purchaseid.Comments.comment.commentno from default.json limit 10;
+-----------------+--+
|    commentno    |
+-----------------+--+
| ["1","5","12"]  |
+-----------------+--+
hive> select purchaseid.Comments.comment[0].commentno from default.json limit 10;
+------------+--+
| commentno  |
+------------+--+
| 1          |
+------------+--+

**tip: use some sort of json formatter while creating ddl for so that we can understand easily where to end the struct or array.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

avatar
Master Guru
@srini

use this ddl

Create EXTERNAL TABLE SRGMSBI1417.json14_07_05_01( 
purchaseid struct
<ticid:string,ticlocation:string,custnum :string, 
Travleinfo:struct
<Trav:struct
<fname:string,lname:string,mname:string,
freq:struct
<fre:struct
<frequencynumber:string,frequnecystatus:string>>, 
food :struct
<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>, 
Seats :struct<Seat:array<struct<seatberth:string,loc:string>>>, stations :struct<station:array<struct<sationname:string,trainnum:string>>>>>, 
Comments :struct<Comment:array<struct<commentno:string,desc:string,passengerseat :struct<intele :string>,passengerloc :struct<intele :string>>>>>) 
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/user/srgmsbi1417/json14_07_05_01';

And don't keep array type for fre because it's struct not the array, that's the reason why you are getting exceptions.

avatar
Contributor

hi @Shu

still i am getting the same error,please find the attachment of errorerror.png

avatar
Contributor

@Shu

Thanks for the help.

i have json file as below format , i am creating ddl which works well,but when i do a select query it throwing error.

json file:::

{"purchaseid": {"ticid": "1496","ticlocation": "vizag","custnum": "222","Travleinfo": {"Trav": {"fname": "ramu","mname": "g","freq": {"fre": {"frequencynumber": "9","frequnecystatus": "na"}},"food": {"foodpref": [{"foodcode": "9","foodcodeSegment": "chic"},{"foodcode": "22","foodcodeSegment": "veg"},{"foodcode": "36","foodcodeSegment": "idl"}] },"Seats": { "Seat": [{"seatberth": "upper","loc": "s15"},{"seatberth": "lower","loc": "s215"},{"seatberth": "upper","loc": "s3"}] },"stations": { "station": [{"sationname": "vizag","trainnum": "c197"},{"sationname": "hyd","trainnum": "hyd187"},{"sationname": "wrgl","trainnum": "wr1822"}]}}},"Comments": {"comment": [{"commentno": "1","desc": "journey","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" }},{"commentno": "5","desc": " food","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" }},{"commentno": "12","desc": " service","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" } }]}}}

DDL:::

Create EXTERNAL TABLE SRGMSBI1417.json14_07_05_01( purchaseid struct<ticid:string,ticlocation:string,custnum :string, Travleinfo:struct< trav:struct<fname:string,lname:string,mname:string, freq :struct<fre:array<struct<frequencynumber:string,frequnecystatus:string>>>, food :struct<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>, Seats :struct<Seat:array<struct<seatberth:string,loc:string>>>, stations :struct<station:array<struct<sationname:string,trainnum:string>>> >>, Comments :struct<Comment:array<struct<commentno:string,desc:string,passengerseat :struct<intele :string>,passengerloc :struct<intele :string> >>> > ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/srgmsbi1417/json14_07_05_01';

error:::

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array

avatar
Contributor

Hi @Shu

Thanks for the help.

i have json file as below format , i am creating ddl which works well,but when i do a select query it throwing error.

json file:::

{"purchaseid": {"ticid": "1496","ticlocation": "vizag","custnum": "222","Travleinfo": {"Trav": {"fname": "ramu","mname": "g","freq": {"fre": {"frequencynumber": "9","frequnecystatus": "na"}},"food": {"foodpref": [{"foodcode": "9","foodcodeSegment": "chic"},{"foodcode": "22","foodcodeSegment": "veg"},{"foodcode": "36","foodcodeSegment": "idl"}] },"Seats": { "Seat": [{"seatberth": "upper","loc": "s15"},{"seatberth": "lower","loc": "s215"},{"seatberth": "upper","loc": "s3"}] },"stations": { "station": [{"sationname": "vizag","trainnum": "c197"},{"sationname": "hyd","trainnum": "hyd187"},{"sationname": "wrgl","trainnum": "wr1822"}]}}},"Comments": {"comment": [{"commentno": "1","desc": "journey","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" }},{"commentno": "5","desc": " food","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" }},{"commentno": "12","desc": " service","passengerseat": { "intele": "09" },"passengerloc": { "intele": "s15" } }]}}}

DDL:::

Create EXTERNAL TABLE SRGMSBI1417.json14_07_05_01( purchaseid struct<ticid:string,ticlocation:string,custnum :string, Travleinfo:struct< trav:struct<fname:string,lname:string,mname:string, freq :struct<fre:array<struct<frequencynumber:string,frequnecystatus:string>>>, food :struct<foodpref:array<struct<foodcode:string,foodcodeSegment:string>>>, Seats :struct<Seat:array<struct<seatberth:string,loc:string>>>, stations :struct<station:array<struct<sationname:string,trainnum:string>>> >>, Comments :struct<Comment:array<struct<commentno:string,desc:string,passengerseat :struct<intele :string>,passengerloc :struct<intele :string> >>> > ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/srgmsbi1417/json14_07_05_01';

error:::

Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start of Array

avatar

i have this type of data anyone plzz help me to define the schema in hive

{
"md5": "bcgjcjdshhdfhkfd",
"dat": "15/08/2000",
"data": '[{"job_id":"589d0147aac2ed8665ce94d6","environment_id":100,"environment_description":"Windows732bit","size":109056,"type":"PE32executable(GUI)Intel80386, forMSWindows","type_short":["peexe"],"target_url":null,"state":"SUCCESS","error_type":null,"error_origin":null,"submit_name":"svchost.exe","md5":"7657fcb7d772448a6d8504e4b20168b8","sha1":"84c7201f7e59cb416280fd69a2e7f2e349ec8242","sha256":"54bc950d46a0d1aa72048a17c8275743209e6c17bdacfc4cb9601c9ce3ec9a71","sha512":"786addd2a793bd4123625b22dc717d193246442ac97f1c3f4a763ec794b48e68051cd41097c0e9f7367e6914534f36eafccb109ab03dc793d68bf1522e7884e2","ssdeep":"1536: xfZZnAEjEIZvumULmj4wrraK5dZ4Ltta9Km\/ec3DtAL6bmZ4bXSjrAE+fySPoqRr: xnnAQVG\/LytaKItS\/fiLKS+f5Aq7is","imphash":"093a51e0b7dcb2466b7edfd78d191aa0","av_detect":91,"vx_family":"Gen: Variant.Kazy","url_analysis":false,"analysis_start_time":"2017-02-10T00: 54: 57-06: 00","threat_score":100,"interesting":false,"threat_level":2,"verdict":"malicious","certificates":[],"domains":["stromoliks.com","promoliks.com"],"classification_tags":[],"compromised_hosts":[],"hosts":[],"total_network_connections":0,"total_processes":2,"total_signatures":37,"extracted_files":[],"processes":[],"file_metadata":null,"tags":[],"mitre_attcks":[]}]'
}