Member since
05-19-2025
9
Posts
0
Kudos Received
3
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 214 | 10-24-2025 06:49 AM | |
| 151 | 10-24-2025 04:45 AM | |
| 341 | 06-06-2025 04:51 AM |
10-24-2025
06:49 AM
We are running the following: Cloudera Flow Management (CFM) 2.1.7.1001 1.26.0.2.1.7.1001-5 built 07/03/2024 09:38:52 EDT Powered by Apache NiFi 1.26.0 I have 4 CRON driven processors that drive their respective process flows. 3 of the processors have execute 6pm weekdays (0 0 18 ? * MON-FRI) with no issues. This one executes at 5pm as expected, but also executes at 4:07pm which is strange. This is on a single core in our development environment. We are not using clusters. Also, since this is our development core, only I have access at this time. And I did not manually execute the processor. And I just realized that our old Windows Service process is executing in development and that the 4:07pm emails came from it and not the Nifi CDF process we are developing and testing. So this is resolved and in no way an issue with Cloudera or Nifi.
... View more
10-24-2025
04:54 AM
I have setup an ExecuteSQLRecord processor to get a list of files that were processed daily at 5pm and send this list via email. The process flow works as expected, except for the CRON schedule. My CRON expression is as follows: 0 0 17 * * ? I believe this is correct to execute daily at 17:00 (5pm). However, the process flow executes daily at 16:07 (4:07pm) and 17:00 (5pm). I am scratching my head trying to figure out why the odd execution at 4:07pm. Does anyone have any insights to where I can check to see why this is happening?
... View more
- Tags:
- cron
Labels:
- Labels:
-
Apache NiFi
-
Cloudera DataFlow (CDF)
10-24-2025
04:45 AM
I found the solution. While the code did validate the XML to the schema, I needed to add a conditional statement to transfer to failure: import groovy.xml.XmlSlurper import groovy.xml.XmlUtil import javax.xml.transform.stream.StreamSource import javax.xml.validation.SchemaFactory import javax.xml.XMLConstants import org.xml.sax.ErrorHandler import org.xml.sax.SAXParseException def flowFile = session.get() if (!flowFile) return def fileXML = flowFile.read().getText("UTF-8") def xmlContent = new XmlSlurper().parseText(fileXML.trim().replaceFirst("^([\\W]+)<","<").replaceAll('\'','\'\'')) def schemaFile = flowFile.getAttribute('XMLSchema') def schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI) def schema = schemaFactory.newSchema(new File(schemaFile)) def validator = schema.newValidator() def validationErrors = new StringBuilder() validator.setErrorHandler(new ErrorHandler() { @Override void warning(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("<p>Warning: ${e.message}</p>") } @Override void error(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("<p>Error: ${e.message}</p>") } @Override void fatalError(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("<p>Fatal Error: ${e.message}</p>") } }) try { validator.validate(new StreamSource(new StringReader(XmlUtil.serialize(xmlContent)))) if (validationErrors.length() > 0) { flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString()) session.transfer(flowFile, REL_FAILURE) } else { session.transfer(flowFile, REL_SUCCESS) } } catch (org.xml.sax.SAXParseException e) { validationErrors.append("Validation failed: ${e.message}\n") flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString()) session.transfer(flowFile, REL_FAILURE) }
... View more
06-06-2025
05:06 AM
Since the later versions of NiFi no longer provide the detailed reason an XML FlowFile fails validation against a schema (just a generic Validation Failed), I have been trying to create my own using the ExecuteGroovyScript processor. I am new to NiFI and Groovy, but I am an experienced .Net developer in C#. Below is my Groovy script in the procesor: import groovy.xml.XmlUtil import javax.xml.transform.stream.StreamSource import javax.xml.validation.SchemaFactory import javax.xml.XMLConstants def flowFile = session.get() if (!flowFile) return def xmlContent = flowFile.read().getText("UTF-8") def schemaFile = new File("My_XML_Schema.xsd") def schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI) def schema = schemaFactory.newSchema(schemaFile) def validator = schema.newValidator() def validationErrors = new StringBuilder() validator.setErrorHandler(new org.xml.sax.helpers.DefaultHandler() { @Override void warning(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Warning: ${e.message}\n") } @Override void error(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Error: ${e.message}\n") } @Override void fatalError(org.xml.sax.SAXParseException e) throws org.xml.sax.SAXException { validationErrors.append("Fatal Error: ${e.message}\n") } }) try { validator.validate(new StreamSource(new StringReader(xmlContent))) session.transfer(flowFile, REL_SUCCESS) } catch (org.xml.sax.SAXParseException e) { validationErrors.append("Validation failed: ${e.message}\n") flowFile = session.putAttribute(flowFile, "xml.validation.errors", validationErrors.toString()) session.transfer(flowFile, REL_FAILURE) } When I test it, it will always rout to Success. Even if I put elements or attributes in the XML file that are not in the schema, it always produces a Valid response. If I test the XML and XSD with the ValidateXML processor it does properly mark as Valid or Invalid. I have also tried parsing the FlowFile into XmlSlurper and then using XmlUtil.serialize in the new StringReader and get the same results. Always valid even if it is not. Has anyone successfully been able to validate XML to schema with a Groovy script and record the details validation errors?
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
06-06-2025
04:51 AM
Turns out the issue is because of the NVARCHAR(MAX) as the processor and JDBC driver cannot seem to properly handle it as the string being stuffed is too large. I ended up returning all rows into an ExecuteGroovyScript processor and then concatenating all the validation errors into a single string.
... View more
05-21-2025
06:12 AM
I am having an issue using the ExecuteSQLRecord processor and writing to JSON. For some reason the results returns a NULL value for one of my fields in the FlowFile results. However, if I execute the query directly in SQL Server, I get all my results with no NULL fields. I checked and there are no special characters, single quotes are escaped and nothing looks to be out of the ordinary. SQL Select Statement: Select emailTo = REPLACE(e.Email_Recipients, ';', ','), emailFrom = e.Email_Sender, emailSubject = e.Email_Subject, emailMessage = CASE WHEN c.ValidationCount = 0 THEN REPLACE(e.Email_Message, '{0}','${filename}') WHEN c.ValidationCount = 1 THEN CONCAT(REPLACE(e.Email_Message, '{0}','${filename}'), '<p>There is 1 validation error in this file</p>', v.Validations) ELSE CONCAT(REPLACE(e.Email_Message, '{0}','${filename}'), '<p>There are ', c.ValidationCount, ' validation errors in this file</p>', v.Validations) END, v.Validations FROM [${DataCollection}].[dbo].[T_SystemEmailNotifications] e Join ( SELECT ve.File_ID, STUFF( ( SELECT CONCAT(' <p>', REPLACE(v1.Comments, '''', ''), '</p>') AS [text()] FROM [dbo].[T_FileValidationLog] v1 WHERE ve.File_ID = v1.File_ID ORDER BY ve.File_ID FOR XML PATH (''), TYPE ).value('text()[1]','nvarchar(max)'), 1, 1, '') [Validations] FROM [${DataCollection}].[dbo].[T_FileValidationLog] ve WHERE [File_ID] = ${FileID} GROUP BY ve.File_ID ) v ON ${FileID} = v.File_ID Join ( SELECT File_ID, Count(*) As ValidationCount FROM [${DataCollection}].[dbo].[T_FileValidationLog] WHERE [File_ID] = ${FileID} GROUP BY File_ID ) c ON v.File_ID = c.File_ID Where e.Email_Type = '${EmailType}' The results returned to JSON are: [ { "emailTo" : "recipient1@somewhere.com,recipient2@somewhere.com", "emailFrom" : "DoNotReply@somewhere.com", "emailSubject" : "Correct Email Subject", "emailMessage" : "<p>The following file was unable to be processed and was rejected:</p><p>filename.xml</p><p>There are 40 validation errors in this file</p>", "Validations" : null } ] However, if I run the query using the FlowFile attribute values directly in SQL Server, I get the full results, including the Validations field. Results for Validations field from SQL Server: <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 193</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 193</p> <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 195</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 195</p> <p>Validation Error: The NOTREPORTED attribute is not declared. - Line: 197</p> <p>Validation Error: The required attribute NotReported is missing. - Line: 197</p> I only included the first few rows of Validations as no need for long content. The results will be used to flow into an EvaluateJSONPath to add the values as attributes. The to a PutEmail processor with a mime type of text/html. The PutEmail is working fine with the results , but the Validations are not there are they are returned as NULL from the ExecuteSQLRecord processor. The only characters I can see that could cause issue are the : . - which I believe should be OK. Thank you for any insight.
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
05-20-2025
06:52 AM
Matt, I figured it out. It was not so much the UnpackContent processor, but how I was bringing in the flowfile itself. I was using ListFile and passing to the UnpackContent processor. I switched to a GetFile processor with a File Filter of .*\.zip and the unzip worked perfectly. So looks like I will need to get the file with GetFile, unzip, and archive the file with PutFile.
... View more
05-19-2025
06:23 AM
These are WinZip files we get from a government website that are setup for public use and reference. Some may contain a directory structure and some may not. I have verified that files do exist in the zip files. I am unable to attach a zip file because the .zip extension is not allowed or supported on the Cloudera community site. So I am unable to upload a zip file. "The file type (.zip) is not supported. Valid file types are: .docx, .xlsx, .pptx, .pdf, .txt, .csv, .png, jpg, .jpeg, .gif, docx, xlsx, pptx, pdf, txt, csv, png, jpeg, gif." Accessing the link on the government site will require credentials so posting the link will not be helpful either. Even if I create a WinZip file with a single file in it, I still get the same error. I also tried CompressContent using the mime type I did in the UpdateAttribute. While I do not get an error, the FlowFile output is a list of files with the same name as the zip file. Does not matter if I have the Update Filename property set to True or False.
... View more
05-19-2025
04:48 AM
Hello, I am attempting to unzip a series of simple .zip files in my NiFi flow. All of my flow executes with no issues except when I try to unpack. There is nothing special about these zip files (i.e. no passwords, etc...). But no matter which way I try to unpack the zip files, I get the error that the zip file does not contain any entries. I have verified that the zip files do contain entries. I can easily unzip with WinZip. I can also easily unzip with our old custom C# application that we are replacing with NiFi. I have tried adjusting the UnpackContent settings and even have added an UpdateAttribute to add the application/zip mime type as well. Still keep getting the failure. I have also tried the CompressContent processor and still get these errors on simple WinZip files. I am attaching my flow, my processor settings and error message.
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)