Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error Querying HIVE table with JSON Serde

avatar
Super Collaborator

Hi,

I have my JSON file with each record as below

{"repoType":1,"repo":"Dev_hadoop","reqUser":"tarapasa","evtTime":"2017-01-04 23:59:54.606","access":"READ_EXECUTE","resource":"/user/putarapasa","resType":"path","action":"read","result":1,"policy":-1,"reason":"/user/putarapasa","enforcer":"hadoop-acl","cliIP":"172.16.2.4","agentHost":"HDPDEV.COM","logType":"RangerAudit","id":"d3285a85-caf9-40bc-bb5d-c4208d0c4b8f-2314470","seq_num":3457350,"event_count":5,"event_dur_ms":4008,"tags":[]}

I created a HIVE table with below sql.

CREATE TABLE HDFS_log( repoType string, repo string, reqUser string, evtTime string, access string, resource string, resType string, action string, result string, policy string, reason string, enforcer string, cliIP string, agentHost string, logType string, id string, seq_num string, event_count string, event_dur_ms string, tags array<string> )

ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'

STORED AS TEXTFILE LOCATION '/user/putarapasa/HDFS Logs'

i copied some files in to the above folder and i was able to query the records without any issue.

when i copied many more files , i am getting an error like below.

it looks like one of the files contained a record where its missing one tags. what do you do in these scenarios.?

ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1493386875011_0001_2_00, diagnostics=[Task failed, taskId=task_1493386875011_0001_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
	... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
	... 17 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('r' (code 114)): was expecting a colon to separate field name and value
 at [Source: java.io.ByteArrayInputStream@40a1ead; line: 1, column: 263]
	at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:179)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:149)
	at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:113)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554)
	... 18 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('r' (code 114)): was expecting a colon to separate field name and value
 at [Source: java.io.ByteArrayInputStream@40a1ead; line: 1, column: 263]
	at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1432)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:385)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:306)
	at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:494)
	at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:172)
	... 21 more
2 REPLIES 2

avatar
New Contributor

It looks like the actual error in the stack trace is saying that there was a malformed/bad record (as you guessed - but the actual error might help you find the record):

Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('r' (code 114)): was expecting a colon to separate field name and value

Do you have the full line for the record among your files that starts like this? :

{"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po

If so, you can identify exactly which file contained your bad data.

Typically, when using hive, it is on the user to clean the data before loading it. Once it is inside hive, though, the idea is that it will try to make sure it writes out good data.

As an aside, the JSONSerDe is a part of HCatalog, and if you were using HCat to read and write data, it has an ability to specify a param called hcat.input.bad.record.threshold (defaulting to 0.0001f) that allows you to ignore "bad data" as long as it doesn't cross a certain threshold. (That, however is not in hive, and I would not recommend usage of HCat just to get around this - it's simpler to simply clean out the offending data and rerun.)

avatar

@Saikrishna Tarapareddy @Sushanth Sowmyan

One way to handle this is to use another JSONSerde that has support for skipping malformed records. I had exactly similar situation and worked around it by using org.openx.data.jsonserde.JsonSerDe. This SerDe has support for handling malformed JSON.

You need to create table with following TBLPROPERTIES.

'ignore.malformed.json'='true'

and ensure this Serde is available to Hive by using ADD JAR command before using this Serde or modifying your Hive JAR path.