01-08-2016 01:49 PM
I have many xml files and have no control over the generation of them. The cloudera search was working fine for well formed xml's. However on bulk indexing, almost all the mappers are failing which results in the job being failed due to the following exception.
Caused by: org.kitesdk.morphline.api.MorphlineRuntimeExceptio
After careful research, I understood that these characters are not allowed in xml version 1.0 but allowed in 1.1. How to tell the morphline code to use xml 1.1 so that those characters are read properly.
I tried after looking into below class
But got error
Error: java.lang.IllegalArgumentException: Unknown configuration option XML_VERSION
Also would the above option be best solution or do we have any other options.
01-08-2016 03:56 PM
01-08-2016 04:21 PM
02-04-2016 09:41 PM
This issue is still not resolved. I tried removing the control characters, and now getting error with surrogate characters. There should be a way to tell the xml parser to use UTF-16 encoding and all these characters rather than replace each. The xquery do not have that option configurable. Can anyone help to resolve this.