Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Could TransformXML work with several xslts ?

Explorer

If I have as an input one xml file and I want to convert it in 3 different jsons with TransformXML processor (using 3 different xslts from one directory). Could this be done with only one processor and using different attributes (at the XSLT file name property) or I will need 3 same processors and change only the file name of the XSLT at the property path?

6 REPLIES 6

Super Guru

If you want to apply all 3 XSLTs to each input XML file (especially if the order of the XSLTs is important), your best bet will be 3 instances of TransformXml in a row, each applying one of the files. Otherwise it might be complicated to associate the incoming XML file(s) with attribute(s) for the XSLT files.

Explorer
@Matt Burgess

Thank you for your instant reply! In my case, the order of the XSLTs doesn't play role so is it efficient to have 3 different TransformXML forked?

Super Guru

If you fork them, how will you be able to apply all 3 to a single file? It sounds to me like more of a pipeline, as the first TransformXml can be working on the second file as the second TransformXml is working on the first file (which has already gone through the first TransformXml). If you want concurrency you can increase the number of concurrent tasks for each TransformXml, but I doubt that will make much of a difference if your downstream flow has a bottleneck. If you want parallel execution and you have a NiFi cluster, you can send your input XML to a Remote Process Group that corresponds to an Input Port on the same cluster. That will distribute your input to each node in the cluster, and each can run the TransformXml pipeline in parallel.

Contributor

@Matt Burgess I just spoke with @Emmanouil Petsanis. To clarify, we have Z number of XSLTs and want to apply all x to each incoming xml file independently. So a single incoming xml file would lead to Z outgoing JSON files like [xslt1(xml), xslt2(xml),..., xsltZ(xml)]. We can hard code this with just having Z number of TransfromXML processors: 14268-screen-shot-2017-03-31-at-123336-pm.png

But its kind of tedious and we have multiple flows with multiple XSLT's.

Is there a way to either:

  • Have transformXML take a list of xslt's?
  • Use a list file on the xslt dir and then somehow generate Z copies of the xml with a different version of the XSLT. (I know this isn't really how NiFi is designed to work) Execute script is the best I could come up with but thought there might be a native way

Thanks,

Seb

Super Guru

If you have to apply all Z transforms to each incoming XML file, branching still won't help, you more likely want a loop. The hard part will be getting the list of XSLT files associated with the incoming XML flow files. Do you expect the number/location/etc. of XSLT files to change often? If not you could pre-populate a DistributedMapCache with the list of XSLT files, and use FetchDistributedMapCache after FetchFile to get the list into an attribute. If that's overkill, you could use ExecuteScript to do the same thing, which might be better because you could add the count as an attribute as well.

Once you have each XML flow file with an attribute containing a comma-separated list of XSLT file paths and a count variable, you can create a loop using UpdateAttribute to decrement the count and to pop the next filename from the list. Assuming your list attribute is called xslt.list and the count attribute is called xslt.count and your "current xslt file" attribute is xslt.current, your UpdateAttribute might be:

xslt.current = ${xslt.list:substringBefore(',')}
xslt.list = ${xslt.substringAfter(',')}
xslt.count = ${xslt.count:toNumber():minus(1)}

A RouteOnAttribute processor can be used as the loop condition, checking for

 ${xslt.count:toNumber():gt(0)}

sending those that pass to the TransformXml (using ${xslt.count} as the XSLT File property value) then the above UpdateAttribute, then back to RouteOnAttribute. Once the loop condition fails, you can route those to the downstream flow. The result should be that each file gets all Z transforms applied, then moves on.

Of course this is just a workaround. I've written NIFI-3667 to cover the improvement to TransformXml to allow multiple stylesheet files.

Contributor

Legend! This is perfect. Even gone the extra step of submitting the improvement ticket! Can't upvote this enough

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.