Member since
05-19-2025
6
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
161 | 06-06-2025 04:51 AM |
06-06-2025
05:06 AM
Since the later versions of NiFi no longer provide the detailed reason an XML FlowFile fails validation against a schema (just a generic Validation Failed), I have been trying to create my own using the ExecuteGroovyScript processor. I am new to NiFI and Groovy, but I am an experienced .Net developer in C#. Below is my Groovy script in the procesor: import groovy.xml.XmlUtil import javax.xml.transform.stream.StreamSource import javax.xml.validation.SchemaFactory import javax.xml.XMLConstants def flowFile = session.get() if (!flowFile) return def xmlContent = flowFile.read().getText("UTF-8") def schemaFile = new File("My_XML_Schema.xsd") def schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI) def schema = schemaFactory.newSchema(schemaFile) def validator = schema.newValidator() def validationErrors = new StringBuilder() validator.setErrorHandler(new org.xml.sax.helpers.DefaultHandler() { @Override void warning(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Warning: ${e.message}\n") } @Override void error(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Error: ${e.message}\n") } @Override void fatalError(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Fatal Error: ${e.message}\n") } }) try { validator.validate(new StreamSource(new StringReader(xmlContent))) session.transfer(flowFile, REL_SUCCESS) } catch (org.xml.sax.SAXParseException e) { validationErrors.append("Validation failed: ${e.message}\n") flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString()) session.transfer(flowFile, REL_FAILURE) } When I test it, it will always rout to Success. Even if I put elements or attributes in the XML file that are not in the schema, it always produces a Valid response. If I test the XML and XSD with the ValidateXML processor it does properly mark as Valid or Invalid. I have also tried parsing the FlowFile into XmlSlurper and then using XmlUtil.serialize in the new StringReader and get the same results. Always valid even if it is not. Has anyone successfully been able to validate XML to schema with a Groovy script and record the details validation errors?
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
06-06-2025
04:51 AM
Turns out the issue is because of the NVARCHAR(MAX) as the processor and JDBC driver cannot seem to properly handle it as the string being stuffed is too large. I ended up returning all rows into an ExecuteGroovyScript processor and then concatenating all the validation errors into a single string.
... View more
05-21-2025
06:12 AM
I am having an issue using the ExecuteSQLRecord processor and writing to JSON. For some reason the results returns a NULL value for one of my fields in the FlowFile results. However, if I execute the query directly in SQL Server, I get all my results with no NULL fields. I checked and there are no special characters, single quotes are escaped and nothing looks to be out of the ordinary. SQL Select Statement: Select emailTo = REPLACE(e.Email_Recipients, ';', ','), emailFrom = e.Email_Sender, emailSubject = e.Email_Subject, emailMessage = CASE WHEN c.ValidationCount = 0 THEN REPLACE(e.Email_Message, '{0}','${filename}') WHEN c.ValidationCount = 1 THEN CONCAT(REPLACE(e.Email_Message, '{0}','${filename}'), '<p>There is 1 validation error in this file</p>', v.Validations) ELSE CONCAT(REPLACE(e.Email_Message, '{0}','${filename}'), '<p>There are ', c.ValidationCount, ' validation errors in this file</p>', v.Validations) END, v.Validations FROM [${DataCollection}].[dbo].[T_SystemEmailNotifications] e Join ( SELECT ve.File_ID, STUFF( ( SELECT CONCAT(' <p>', REPLACE(v1.Comments, '''', ''), '</p>') AS [text()] FROM [dbo].[T_FileValidationLog] v1 WHERE ve.File_ID = v1.File_ID ORDER BY ve.File_ID FOR XML PATH (''), TYPE ).value('text()[1]','nvarchar(max)'), 1, 1, '') [Validations] FROM [${DataCollection}].[dbo].[T_FileValidationLog] ve WHERE [File_ID] = ${FileID} GROUP BY ve.File_ID ) v ON ${FileID} = v.File_ID Join ( SELECT File_ID, Count(*) As ValidationCount FROM [${DataCollection}].[dbo].[T_FileValidationLog] WHERE [File_ID] = ${FileID} GROUP BY File_ID ) c ON v.File_ID = c.File_ID Where e.Email_Type = '${EmailType}' The results returned to JSON are: [ { "emailTo" : "recipient1@somewhere.com,recipient2@somewhere.com", "emailFrom" : "DoNotReply@somewhere.com", "emailSubject" : "Correct Email Subject", "emailMessage" : "<p>The following file was unable to be processed and was rejected:</p><p>filename.xml</p><p>There are 40 validation errors in this file</p>", "Validations" : null } ] However, if I run the query using the FlowFile attribute values directly in SQL Server, I get the full results, including the Validations field. Results for Validations field from SQL Server: <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 193</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 193</p> <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 195</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 195</p> <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 197</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 197</p> I only included the first few rows of Validations as no need for long content. The results will be used to flow into an EvaluateJSONPath to add the values as attributes. The to a PutEmail processor with a mime type of text/html. The PutEmail is working fine with the results , but the Validations are not there are they are returned as NULL from the ExecuteSQLRecord processor. The only characters I can see that could cause issue are the : . - which I believe should be OK. Thank you for any insight.
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
05-20-2025
06:52 AM
Matt, I figured it out. It was not so much the UnpackContent processor, but how I was bringing in the flowfile itself. I was using ListFile and passing to the UnpackContent processor. I switched to a GetFile processor with a File Filter of .*\.zip and the unzip worked perfectly. So looks like I will need to get the file with GetFile, unzip, and archive the file with PutFile.
... View more
05-19-2025
06:23 AM
These are WinZip files we get from a government website that are setup for public use and reference. Some may contain a directory structure and some may not. I have verified that files do exist in the zip files. I am unable to attach a zip file because the .zip extension is not allowed or supported on the Cloudera community site. So I am unable to upload a zip file. "The file type (.zip) is not supported. Valid file types are: .docx, .xlsx, .pptx, .pdf, .txt, .csv, .png, jpg, .jpeg, .gif, docx, xlsx, pptx, pdf, txt, csv, png, jpeg, gif." Accessing the link on the government site will require credentials so posting the link will not be helpful either. Even if I create a WinZip file with a single file in it, I still get the same error. I also tried CompressContent using the mime type I did in the UpdateAttribute. While I do not get an error, the FlowFile output is a list of files with the same name as the zip file. Does not matter if I have the Update Filename property set to True or False.
... View more
05-19-2025
04:48 AM
Hello, I am attempting to unzip a series of simple .zip files in my NiFi flow. All of my flow executes with no issues except when I try to unpack. There is nothing special about these zip files (i.e. no passwords, etc...). But no matter which way I try to unpack the zip files, I get the error that the zip file does not contain any entries. I have verified that the zip files do contain entries. I can easily unzip with WinZip. I can also easily unzip with our old custom C# application that we are replacing with NiFi. I have tried adjusting the UnpackContent settings and even have added an UpdateAttribute to add the application/zip mime type as well. Still keep getting the failure. I have also tried the CompressContent processor and still get these errors on simple WinZip files. I am attaching my flow, my processor settings and error message.
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)