Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive JSON SerDE org.apache.hadoop.hive.serde2.SerDeException: DelimitedJSONSerDe cannot deserialize

avatar

Hi Team,
Below are my installation details
Hive 1.1.0-cdh5.11.2
Hadoop 2.6.0-cdh5.11.2

I want to load JSON data into Hive table for further analysis thus I followed belwo steps,


1. created Hive table using serde - successful
CREATE EXTERNAL TABLE IF NOT EXISTS testtable (name String, age int, createdon Date) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.DelimitedJSONSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/user/testuser/test';

 

2. Loaded JSON data into Hive table from local file - successful
load data local inpath '/home/testuser/test.json' into table testtable;

 

However, when I try to query this data using Selct command I get below error,
hive> select * from testtable;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: DelimitedJSONSerDe cannot deserialize.

 

Please help me understand the root cause of this error. Also, do suggest if this is the right way to load and query JSON data.

2 REPLIES 2

avatar
Expert Contributor

How does the data looks like? I think the json has to be in one row (so cant contain newlines) and you have to have one json per line. At least I had a similar issue when I wanted to load a data via external table, where the json contained one big list with many dict elements.

avatar

HI TomasTF,

I tried 3 ways,


1. Data without newline : sample below 

{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13},{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}


2. Data with new line and comma
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13},
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}


3. Data with new line and without comma
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13}
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}


all three failed with same error

Next, I deleted the table and recreated with Double


CREATE EXTERNAL TABLE IF NOT EXISTS testtable (Id string,Name string,Reseller string,Category string,Price Double,Discount Double,Profit_percent Double) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location "/user/testuser/test";


with Data in below format (new line without comma)
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13}
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}


Now Select Query is not fetching result without error.

However, am still now sure why Decimal was throwing error.