Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Querying JSON data using Hive

avatar

I am using https://github.com/rcongiu/Hive-JSON-Serde to query the JSON data via hive.

As part of testing, I am using an external table to query the JSON plain text file in HDFS.

i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

View solution in original post

16 REPLIES 16

avatar
Expert Contributor

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

avatar

@gopal, Yes. The input JSON is all in single line.

Enabled debugging, to see if the select * on hive console is using a different Serde than the one defined in the Table. however it looks thats not the case,

Enabled the debug option in hive cli

serialization.lib=org.openx.data.jsonserde.JsonSerDe,

avatar

What if my Json record is spread across multiple lines?

avatar
Maybe use a different SERDE?
  1. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' or
  2. ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' from https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master

avatar

@Ancil McBarnett, i am using the 2nd one here. Which is quite popular and well tested as it appers

avatar
Master Mentor

load the JSON as string with column name "json"

SELECT get_json_object(json, '$.id') AS ID,get_json_object(json, '$.person.last_name') AS LastName,get_json_object(json, '$.person.first_name') AS FirstName,get_json_object(json, '$.person.email') AS email,get_json_object(json, '$.person.location.address') AS Address,get_json_object(json, '$.person.location.city') AS City,get_json_object(json, '$.person.location.state') AS State,get_json_object(json, '$.person.location.zipcode') AS Zip,get_json_object(json, '$.person.text') AS Text,get_json_object(json, '$.person.url') AS URLFROM HBASE_JSON_TABLE;ORSELECT id, lastName, firstName, email, city, state, text, url FROM hbase_json_table A LATERAL VIEW json_tuple(A.json, 'id', 'person') B AS id, personLATERAL VIEW json_tuple(person, 'last_name', 'first_name', 'email', 'text', 'url', 'location') C as lastName, firstName, email, text, url, locLATERAL VIEW json_tuple(loc, 'city', 'state') D AS city, state;

OR

CREATE EXTERNAL TABLE json_serde_table (
  id string,
  person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/json/';
SELECT id, person.first_name, person.last_name, person.email, 
person.location.address, person.location.city, person.location.state, 
person.location.zipcode, person.text, person.url
FROM json_serde_table LIMIT 5;

avatar

@Artem Ervits. Have used the last create external statement . If i do select * from json_serde_table ; i get a JSON output.but not in a valid format.

avatar
Master Mentor

can you paste the output

avatar
Super Collaborator

hi Artem

I used your method but I am getting error can you help please?

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive>
    >
    > select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>