Support Questions

Find answers, ask questions, and share your expertise

Groovy script to validate XML against schema (XSD) always returns a valid XML even when it is invalid

avatar
Explorer

Since the later versions of NiFi no longer provide the detailed reason an XML FlowFile fails validation against a schema (just a generic Validation Failed), I have been trying to create my own using the ExecuteGroovyScript processor. I am new to NiFI and Groovy, but I am an experienced .Net developer in C#.

Below is my Groovy script in the procesor:

import groovy.xml.XmlUtil
import javax.xml.transform.stream.StreamSource
import javax.xml.validation.SchemaFactory
import javax.xml.XMLConstants

def flowFile = session.get()
if (!flowFile) return

def xmlContent = flowFile.read().getText("UTF-8")

def schemaFile = new File("My_XML_Schema.xsd")

def schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
def schema = schemaFactory.newSchema(schemaFile)
def validator = schema.newValidator()

def validationErrors = new StringBuilder()
validator.setErrorHandler(new org.xml.sax.helpers.DefaultHandler() {
@Override
void warning(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Warning: ${e.message}\n")
}
@Override
void error(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Error: ${e.message}\n")
}
@Override
void fatalError(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException {
validationErrors.append("Fatal Error: ${e.message}\n")
}
})

try {
validator.validate(new StreamSource(new StringReader(xmlContent)))
session.transfer(flowFile, REL_SUCCESS)
} catch (org.xml.sax.SAXParseException e) {
validationErrors.append("Validation failed: ${e.message}\n")
flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString())
session.transfer(flowFile, REL_FAILURE)
}

When I test it, it will always rout to Success. Even if I put elements or attributes in the XML file that are not in the schema, it always produces a Valid response. If I test the XML and XSD with the ValidateXML processor it does properly mark as Valid or Invalid.

I have also tried parsing the FlowFile into XmlSlurper and then using XmlUtil.serialize in the new StringReader and get the same results. Always valid even if it is not.

Has anyone successfully been able to validate XML to schema with a Groovy script and record the details validation errors?

1 REPLY 1

avatar
Community Manager

Hi @MattWho @ArtiW @gtorres Do you have any insights here? Thanks!


Regards,

Diana Torres,
Senior Community Moderator


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: