Support Questions

Find answers, ask questions, and share your expertise

Dynamically Assign an XSD File

avatar
Expert Contributor

I have about 25-30 XML message types and each message type has its own XSD.  I need to validate each message against their respective XSD.  

When using the ValidateXML processor is there any way to dynamically assign the appropriate XSD to a flow file based on on attribute value?  I don't see the purpose/benefit of using so-called variables when said variables aren't even variable--they are STATIC! 

Why does this processor ONLY use variable_registry variables and not attribute values like every other processor in NiFi?  

 

ChuckE_0-1663390818820.png

 

1 ACCEPTED SOLUTION

avatar
Super Guru

@SAMSAL @ChuckE ,

 

I believe parsing the schema for each flowfile that goes through the processor would be too expensive. Because of that, the schema is parsed only once when the processor is scheduled and used for every flowfile. That's why the attribute values cannot be used for this property.

 

Having a schema hashmap<filename, parsed_schema> internally could be an interesting idea so that the processor would parse the schema onTrigger only once for every schema file name and reuse it  afterwards. Obviously memory usage could be a problem if you have too many schemas, but I don't think this is likely to happen. This doesn't happen currently, but it would be a nice feature request IMO.

 

Currently, you can either do that with a scripting processing or use RouteOnAttribute to send each message to a ValidateXML processor with the correct schema.

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

5 REPLIES 5

avatar
Super Guru

Hi,

 

To answer your question, it depends:

1- If the attribute value that on the xml flowfile can be used to derive the XSD file name\path then you can use Expression Language to construct the XSD file name from the attribute value when assigning the "Schema File" property. For example if all XSD have the following filename format [SomeID]_XSD_File.xsd and the attribute "Id_Attribute" has the ID then the "Schema File" property can be set to: ${Id_attribute:append('_XSD_File.xsd ')}

 

2- If the xsd file name cant be derived through the attribute value and you have to do some If-Else conditions based on the attribute then you can use ExecuteScript processor do that and set new flowfile attribute with the xsd file schema path. To learn how to use the ExecuteScript to add new attribute based on some custom code you can check the "Recipe: Add an attribute to a flow file" under: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922

 

3- If you dont want to use ExecuteScript processor, you can utilize the UpdateAttribute Processor by creating different Rules, Conditions & Actions to set a File Schema attribute that can be used in the ValidateXML File Schema property. To learn how you can create different rules , condition & action please check the following tutorial:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-update-attribute-nar/1.5.0/or...

 

Hope that helps, if it does please accept solution.

Thanks

 

avatar
Expert Contributor

Thank you SAMSAL for the reply.

Ordinarily you would be correct, however, the ValidateXML processor does things differently.  

If my flowfile has an attribute named "schema.name" and I use the following expression language:

${schema.name:prepend('/opt/nifi/schemas/xsd/'):append('.xsd')}

 

...then I get the following error.  It seems the ValidateXML processor doesn't actually support dynamic run-time assignment of variables.   Even using the variable registry doesn't solve the problem because the path/filename variable needs to resolve at design time.

 

Perform Validation.  

Component is invalid: 'Schema File' validated against '/opt/nifi/schemas/xsd/.xsd' is invalid because
The specified resource(s) do not exist or could not be accessed: [/opt/nifi/schemas/xsd/.xsd]

 

Hopefully there is something I'm missing, otherwise I'll have to use the ExecuteScript to build my own validation routine.

 

 


@ChuckE wrote:

I have about 25-30 XML message types and each message type has its own XSD.  I need to validate each message against their respective XSD.  

When using the ValidateXML processor is there any way to dynamically assign the appropriate XSD to a flow file based on on attribute value?  I don't see the purpose/benefit of using so-called variables when said variables aren't even variable--they are STATIC! 

Why does this processor ONLY use variable_registry variables and not attribute values like every other processor in NiFi?  

 

ChuckE_0-1663390818820.png

 


avatar
Super Guru

hmmm, you are right! the ValidateXML processor only accepts variables from the variable registry as noted in the Schema File property description. @MattWho , @araujo do you know why is this the case for the ValidateXML and what would be the proper solution?

The only way I can think of in this case is updating the variable in the variable registry through the API to assign the proper value to the schema.name variable:

https://stackoverflow.cm/questions/52010827/how-to-change-nifi-variable-registry-using-rest-api

 

avatar
Super Guru

@SAMSAL @ChuckE ,

 

I believe parsing the schema for each flowfile that goes through the processor would be too expensive. Because of that, the schema is parsed only once when the processor is scheduled and used for every flowfile. That's why the attribute values cannot be used for this property.

 

Having a schema hashmap<filename, parsed_schema> internally could be an interesting idea so that the processor would parse the schema onTrigger only once for every schema file name and reuse it  afterwards. Obviously memory usage could be a problem if you have too many schemas, but I don't think this is likely to happen. This doesn't happen currently, but it would be a nice feature request IMO.

 

Currently, you can either do that with a scripting processing or use RouteOnAttribute to send each message to a ValidateXML processor with the correct schema.

 

Cheers,

André

 

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Expert Contributor

Thanks @SAMSAL  and @araujo for the responses.  The RouteOnAttribute is what I am using presently but it gets unwieldily after just a couple of route options.  Looks like I'm just gonna need to build a custom validator using the ExecuteScript processor.  Hopefully that scales.