Since the later versions of NiFi no longer provide the detailed reason an XML FlowFile fails validation against a schema (just a generic Validation Failed), I have been trying to create my own using the ExecuteGroovyScript processor. I am new to NiFI and Groovy, but I am an experienced .Net developer in C#.
Below is my Groovy script in the procesor:
import groovy.xml.XmlUtil
import javax.xml.transform.stream.StreamSource
import javax.xml.validation.SchemaFactory
import javax.xml.XMLConstants
def flowFile = session.get()
if (!flowFile) return
def xmlContent = flowFile.read().getText("UTF-8")
def schemaFile = new File("My_XML_Schema.xsd")
def schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
def schema = schemaFactory.newSchema(schemaFile)
def validator = schema.newValidator()
def validationErrors = new StringBuilder()
validator.setErrorHandler(new org.xml.sax.helpers.DefaultHandler() {
@Override
void warning(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Warning: ${e.message}\n")
}
@Override
void error(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Error: ${e.message}\n")
}
@Override
void fatalError(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Fatal Error: ${e.message}\n")
}
})
try {
validator.validate(new StreamSource(new StringReader(xmlContent)))
session.transfer(flowFile, REL_SUCCESS)
} catch (org.xml.sax.SAXParseException e) {
validationErrors.append("Validation failed: ${e.message}\n")
flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString())
session.transfer(flowFile, REL_FAILURE)
}
When I test it, it will always rout to Success. Even if I put elements or attributes in the XML file that are not in the schema, it always produces a Valid response. If I test the XML and XSD with the ValidateXML processor it does properly mark as Valid or Invalid.
I have also tried parsing the FlowFile into XmlSlurper and then using XmlUtil.serialize in the new StringReader and get the same results. Always valid even if it is not.
Has anyone successfully been able to validate XML to schema with a Groovy script and record the details validation errors?