Support Questions

Jagatheeshr · ‎11-19-2015

I am using https://github.com/rcongiu/Hive-JSON-Serde to query the JSON data via hive.

As part of testing, I am using an external table to query the JSON plain text file in HDFS.

i am able to query the data from hive using select, However when i do select * from JSON_EXTERNAL_TABLE limit 1, the output is an Invalid JSON though the message in HDFS is a valid JSON. Is this an expected one ?

gopalv · ‎11-19-2015

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

View solution in original post

gopalv · ‎11-19-2015

JSON Serde works on 1 line at a time, with each line being parsed independently. Is your JSON encoded to fit into 1 line for each record in your stream?

Jagatheeshr · ‎11-20-2015

@gopal, Yes. The input JSON is all in single line.

Enabled debugging, to see if the select * on hive console is using a different Serde than the one defined in the Table. however it looks thats not the case,

Enabled the debug option in hive cli

serialization.lib=org.openx.data.jsonserde.JsonSerDe,

sreecharan_akir · ‎08-22-2016

What if my Json record is spread across multiple lines?

amcbarnett · ‎11-19-2015

Maybe use a different SERDE?

ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' or
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' from https://github.com/sheetaldolas/Hive-JSON-Serde/tree/master

Jagatheeshr · ‎11-20-2015

@Ancil McBarnett, i am using the 2nd one here. Which is quite popular and well tested as it appers

aervits · ‎11-19-2015

load the JSON as string with column name "json"

SELECT get_json_object(json, '$.id') AS ID,get_json_object(json, '$.person.last_name') AS LastName,get_json_object(json, '$.person.first_name') AS FirstName,get_json_object(json, '$.person.email') AS email,get_json_object(json, '$.person.location.address') AS Address,get_json_object(json, '$.person.location.city') AS City,get_json_object(json, '$.person.location.state') AS State,get_json_object(json, '$.person.location.zipcode') AS Zip,get_json_object(json, '$.person.text') AS Text,get_json_object(json, '$.person.url') AS URLFROM HBASE_JSON_TABLE;ORSELECT id, lastName, firstName, email, city, state, text, url FROM hbase_json_table A LATERAL VIEW json_tuple(A.json, 'id', 'person') B AS id, personLATERAL VIEW json_tuple(person, 'last_name', 'first_name', 'email', 'text', 'url', 'location') C as lastName, firstName, email, text, url, locLATERAL VIEW json_tuple(loc, 'city', 'state') D AS city, state;

OR

CREATE EXTERNAL TABLE json_serde_table (
  id string,
  person struct<email:string, first_name:string, last_name:string, location:struct<address:string, city:string, state:string, zipcode:string>, text:string, url:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION '/tmp/json/';
SELECT id, person.first_name, person.last_name, person.email, 
person.location.address, person.location.city, person.location.state, 
person.location.zipcode, person.text, person.url
FROM json_serde_table LIMIT 5;

Jagatheeshr · ‎11-20-2015

@Artem Ervits. Have used the last create external statement . If i do select * from json_serde_table ; i get a JSON output.but not in a valid format.

aervits · ‎11-20-2015

can you paste the output

aliyesami · ‎10-24-2016

hi Artem

I used your method but I am getting error can you help please?

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive>
    >
    > select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>

Cloudera Community

Support Questions

Querying JSON data using Hive

Querying JSON Data in Hive - returning null

Streaming/Query data to CDP Public Cloud Using Clo...

Hive insert query optimization

Parsing NameNode metrics data using Python urllib2...

Using Custom Data Connections in Cloudera Machine ...

Querying Data via SparkSQL with ODBC Tools

Using Data Sketches for Fast Analytics with Apache...

Ingesting Log data using MiNiFi & NiFi

Connect to CDP DataHub Hive using Cloudera ODBC Dr...

Hbase filter query using pyspark