Created on 09-12-2018 02:21 AM - edited 09-16-2022 06:41 AM
Hi Team,
Below are my installation details
Hive 1.1.0-cdh5.11.2
Hadoop 2.6.0-cdh5.11.2
I want to load JSON data into Hive table for further analysis thus I followed belwo steps,
1. created Hive table using serde - successful
CREATE EXTERNAL TABLE IF NOT EXISTS testtable (name String, age int, createdon Date) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.DelimitedJSONSerDe' WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") LOCATION '/user/testuser/test';
2. Loaded JSON data into Hive table from local file - successful
load data local inpath '/home/testuser/test.json' into table testtable;
However, when I try to query this data using Selct command I get below error,
hive> select * from testtable;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: DelimitedJSONSerDe cannot deserialize.
Please help me understand the root cause of this error. Also, do suggest if this is the right way to load and query JSON data.
Created 09-12-2018 10:28 AM
How does the data looks like? I think the json has to be in one row (so cant contain newlines) and you have to have one json per line. At least I had a similar issue when I wanted to load a data via external table, where the json contained one big list with many dict elements.
Created 09-20-2018 09:20 PM
HI TomasTF,
I tried 3 ways,
1. Data without newline : sample below
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13},{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}
2. Data with new line and comma
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13},
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}
3. Data with new line and without comma
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13}
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}
all three failed with same error
Next, I deleted the table and recreated with Double
CREATE EXTERNAL TABLE IF NOT EXISTS testtable (Id string,Name string,Reseller string,Category string,Price Double,Discount Double,Profit_percent Double) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' location "/user/testuser/test";
with Data in below format (new line without comma)
{"id":"P121", "name":"Round Tee", "reseller":"Puma", "category":"Top Wear", "price":2379.06, "discount":22.57, "profit_percent":20.13}
{"id":"P122", "name":"pant", "reseller":"Reebok", "category":"bottom", "price":1234.99, "discount":99.99, "profit_percent":67.12}
Now Select Query is not fetching result without error.
However, am still now sure why Decimal was throwing error.