About sush

sush · ‎05-02-2017

It looks like the actual error in the stack trace is saying that there was a malformed/bad record (as you guessed - but the actual error might help you find the record): Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('r' (code 114)): was expecting a colon to separate field name and value Do you have the full line for the record among your files that starts like this? : {"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po If so, you can identify exactly which file contained your bad data. Typically, when using hive, it is on the user to clean the data before loading it. Once it is inside hive, though, the idea is that it will try to make sure it writes out good data. As an aside, the JSONSerDe is a part of HCatalog, and if you were using HCat to read and write data, it has an ability to specify a param called hcat.input.bad.record.threshold (defaulting to 0.0001f) that allows you to ignore "bad data" as long as it doesn't cross a certain threshold. (That, however is not in hive, and I would not recommend usage of HCat just to get around this - it's simpler to simply clean out the offending data and rerun.)

sush · ‎07-07-2016

Yup. The "EXPORT ... FOR REPLICATION" command was added only in 1.2.0+ , and this is used in the source cluster. IMPORT semantics changing to allow for "import-only-if-newer" which is used to apply updates to a table in the destination cluster, which is used by HiveDR was also added only in 1.2.0+. Thus, you will need 1.2.0+ on both clusters.

Online	Offline
Last Visited	‎02-23-2018 10:21 PM

Member Since	‎09-29-2015 10:05 PM
Last Visited	‎02-23-2018 10:21 PM
Posts	3
Kudos received	1

Cloudera Community

Re: Error Querying HIVE table with JSON Serde

Re: Is it possible to perform HIVE data mirroring ...