28727
DISCUSSIONS
101726
MEMBERS
3157
ARTICLES
Created 01-08-2016 01:49 PM
Hi,
I have many xml files and have no control over the generation of them. The cloudera search was working fine for well formed xml's. However on bulk indexing, almost all the mappers are failing which results in the job being failed due to the following exception.
Caused by: org.kitesdk.morphline.api.MorphlineRuntimeException: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 3)) at [row,col {unknown-source}]: [13220,0]
After careful research, I understood that these characters are not allowed in xml version 1.0 but allowed in 1.1. How to tell the morphline code to use xml 1.1 so that those characters are read properly.
I tried after looking into below class
http://www.saxonica.com/html/documentation/javadoc/net/sf/saxon/lib/FeatureKeys.html#XML_VERSION
xquery {
features: {"XML_VERSION":"1.1"}
}
But got error
Error: java.lang.IllegalArgumentException: Unknown configuration option XML_VERSION
Also would the above option be best solution or do we have any other options.