Created 11-19-2015 08:19 PM
I am using https://github.com/rcongiu/Hive-JSON-Serde to query the JSON data via hive.
As part of testing, I am using an external table to query the JSON plain text file in HDFS.
i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?
Created 11-19-2015 08:20 PM
JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?
Created 11-19-2015 08:20 PM
JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?
Created 11-20-2015 12:06 PM
@gopal, Yes. The input JSON is all in single line.
Enabled debugging, to see if the select * on hive console is using a different Serde than the one defined in the Table. however it looks thats not the case,
Enabled the debug option in hive cli
serialization.lib=org.openx.data.jsonserde.JsonSerDe,
Created 08-22-2016 05:16 AM
What if my Json record is spread across multiple lines?
Created 11-19-2015 08:27 PM
Created 11-20-2015 11:59 AM
@Ancil McBarnett, i am using the 2nd one here. Which is quite popular and well tested as it appers
Created 11-19-2015 08:55 PM
load the JSON as string with column name "json"
SELECT get_json_object(json, '$.id') AS ID,get_json_object(json, '$.person.last_name') AS LastName,get_json_object(json, '$.person.first_name') AS FirstName,get_json_object(json, '$.person.email') AS email,get_json_object(json, '$.person.location.address') AS Address,get_json_object(json, '$.person.location.city') AS City,get_json_object(json, '$.person.location.state') AS State,get_json_object(json, '$.person.location.zipcode') AS Zip,get_json_object(json, '$.person.text') AS Text,get_json_object(json, '$.person.url') AS URLFROM HBASE_JSON_TABLE;ORSELECT id, lastName, firstName, email, city, state, text, url FROM hbase_json_table A LATERAL VIEW json_tuple(A.json, 'id', 'person') B AS id, personLATERAL VIEW json_tuple(person, 'last_name', 'first_name', 'email', 'text', 'url', 'location') C as lastName, firstName, email, text, url, locLATERAL VIEW json_tuple(loc, 'city', 'state') D AS city, state;
OR
CREATE EXTERNAL TABLE json_serde_table ( id string, person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/tmp/json/'; SELECT id, person.first_name, person.last_name, person.email, person.location.address, person.location.city, person.location.state, person.location.zipcode, person.text, person.url FROM json_serde_table LIMIT 5;
Created 11-20-2015 11:58 AM
@Artem Ervits. Have used the last create external statement . If i do select * from json_serde_table ; i get a JSON output.but not in a valid format.
Created 11-20-2015 01:39 PM
can you paste the output
Created 10-24-2016 05:05 PM
hi Artem
I used your method but I am getting error can you help please?
CREATE EXTERNAL TABLE tweetdata3(created_at STRING, text STRING, person STRUCT< screen_name:STRING, name:STRING, locations:STRING, description:STRING, created_at:STRING, followers_count:INT, url:STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets'; hive> > > select person.name,person.locations, person.created_at, text from tweetdata3; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2] Time taken: 0.274 seconds hive>