Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Querying JSON data using Hive

Solved Go to solution

Querying JSON data using Hive

I am using https://github.com/rcongiu/Hive-JSON-Serde to query the JSON data via hive.

As part of testing, I am using an external table to query the JSON plain text file in HDFS.

i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Querying JSON data using Hive

Rising Star

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

View solution in original post

16 REPLIES 16
Highlighted

Re: Querying JSON data using Hive

Rising Star

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

View solution in original post

Highlighted

Re: Querying JSON data using Hive

@gopal, Yes. The input JSON is all in single line.

Enabled debugging, to see if the select * on hive console is using a different Serde than the one defined in the Table. however it looks thats not the case,

Enabled the debug option in hive cli

serialization.lib=org.openx.data.jsonserde.JsonSerDe,

Highlighted

Re: Querying JSON data using Hive

New Contributor

What if my Json record is spread across multiple lines?

Highlighted

Re: Querying JSON data using Hive

Maybe use a different SERDE?
  1. ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' or
  2. ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' from https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master
Highlighted

Re: Querying JSON data using Hive

@Ancil McBarnett, i am using the 2nd one here. Which is quite popular and well tested as it appers

Highlighted

Re: Querying JSON data using Hive

Mentor

load the JSON as string with column name "json"

SELECT get_json_object(json, '$.id') AS ID,get_json_object(json, '$.person.last_name') AS LastName,get_json_object(json, '$.person.first_name') AS FirstName,get_json_object(json, '$.person.email') AS email,get_json_object(json, '$.person.location.address') AS Address,get_json_object(json, '$.person.location.city') AS City,get_json_object(json, '$.person.location.state') AS State,get_json_object(json, '$.person.location.zipcode') AS Zip,get_json_object(json, '$.person.text') AS Text,get_json_object(json, '$.person.url') AS URLFROM HBASE_JSON_TABLE;ORSELECT id, lastName, firstName, email, city, state, text, url FROM hbase_json_table A LATERAL VIEW json_tuple(A.json, 'id', 'person') B AS id, personLATERAL VIEW json_tuple(person, 'last_name', 'first_name', 'email', 'text', 'url', 'location') C as lastName, firstName, email, text, url, locLATERAL VIEW json_tuple(loc, 'city', 'state') D AS city, state;

OR

CREATE EXTERNAL TABLE json_serde_table (
  id string,
  person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/json/';
SELECT id, person.first_name, person.last_name, person.email, 
person.location.address, person.location.city, person.location.state, 
person.location.zipcode, person.text, person.url
FROM json_serde_table LIMIT 5;
Highlighted

Re: Querying JSON data using Hive

@Artem Ervits. Have used the last create external statement . If i do select * from json_serde_table ; i get a JSON output.but not in a valid format.

Highlighted

Re: Querying JSON data using Hive

Mentor

can you paste the output

Highlighted

Re: Querying JSON data using Hive

Master Collaborator

hi Artem

I used your method but I am getting error can you help please?

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive>
    >
    > select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>



Don't have an account?
Coming from Hortonworks? Activate your account here