Member since
09-29-2015
3
Posts
1
Kudos Received
0
Solutions
05-02-2017
04:23 AM
1 Kudo
It looks like the actual error in the stack trace is saying that there was a malformed/bad record (as you guessed - but the actual error might help you find the record): Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('r' (code 114)): was expecting a colon to separate field name and value Do you have the full line for the record among your files that starts like this? : {"repoType":1,"repo":"NestlePurinaDev_hadoop","reqUser":"hbase","evtTime":"2016-12-27 09:49:00.951","access":"WRITE","resource":"/apps/hbase/data/data/hbase/namespace/2fdbb2aa9731bb723a48bfd157b60af2/recovered.edits/67.seqid","resType":"path","result":1,"po If so, you can identify exactly which file contained your bad data. Typically, when using hive, it is on the user to clean the data before loading it. Once it is inside hive, though, the idea is that it will try to make sure it writes out good data. As an aside, the JSONSerDe is a part of HCatalog, and if you were using HCat to read and write data, it has an ability to specify a param called hcat.input.bad.record.threshold (defaulting to 0.0001f) that allows you to ignore "bad data" as long as it doesn't cross a certain threshold. (That, however is not in hive, and I would not recommend usage of HCat just to get around this - it's simpler to simply clean out the offending data and rerun.)
... View more
07-07-2016
07:23 PM
Yup. The "EXPORT ... FOR REPLICATION" command was added only in 1.2.0+ , and this is used in the source cluster. IMPORT semantics changing to allow for "import-only-if-newer" which is used to apply updates to a table in the destination cluster, which is used by HiveDR was also added only in 1.2.0+. Thus, you will need 1.2.0+ on both clusters.
... View more