Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Could TransformXML work with several xslts ?

Could TransformXML work with several xslts ?

New Contributor

If I have as an input one xml file and I want to convert it in 3 different jsons with TransformXML processor (using 3 different xslts from one directory). Could this be done with only one processor and using different attributes (at the XSLT file name property) or I will need 3 same processors and change only the file name of the XSLT at the property path?

6 REPLIES 6

Re: Could TransformXML work with several xslts ?

If you want to apply all 3 XSLTs to each input XML file (especially if the order of the XSLTs is important), your best bet will be 3 instances of TransformXml in a row, each applying one of the files. Otherwise it might be complicated to associate the incoming XML file(s) with attribute(s) for the XSLT files.

Re: Could TransformXML work with several xslts ?

New Contributor
@Matt Burgess

Thank you for your instant reply! In my case, the order of the XSLTs doesn't play role so is it efficient to have 3 different TransformXML forked?

Re: Could TransformXML work with several xslts ?

If you fork them, how will you be able to apply all 3 to a single file? It sounds to me like more of a pipeline, as the first TransformXml can be working on the second file as the second TransformXml is working on the first file (which has already gone through the first TransformXml). If you want concurrency you can increase the number of concurrent tasks for each TransformXml, but I doubt that will make much of a difference if your downstream flow has a bottleneck. If you want parallel execution and you have a NiFi cluster, you can send your input XML to a Remote Process Group that corresponds to an Input Port on the same cluster. That will distribute your input to each node in the cluster, and each can run the TransformXml pipeline in parallel.

Highlighted

Re: Could TransformXML work with several xslts ?

Contributor

@Matt Burgess I just spoke with @Emmanouil Petsanis. To clarify, we have Z number of XSLTs and want to apply all x to each incoming xml file independently. So a single incoming xml file would lead to Z outgoing JSON files like [xslt1(xml), xslt2(xml),..., xsltZ(xml)]. We can hard code this with just having Z number of TransfromXML processors: 14268-screen-shot-2017-03-31-at-123336-pm.png

But its kind of tedious and we have multiple flows with multiple XSLT's.

Is there a way to either:

  • Have transformXML take a list of xslt's?
  • Use a list file on the xslt dir and then somehow generate Z copies of the xml with a different version of the XSLT. (I know this isn't really how NiFi is designed to work) Execute script is the best I could come up with but thought there might be a native way

Thanks,

Seb

Re: Could TransformXML work with several xslts ?

If you have to apply all Z transforms to each incoming XML file, branching still won't help, you more likely want a loop. The hard part will be getting the list of XSLT files associated with the incoming XML flow files. Do you expect the number/location/etc. of XSLT files to change often? If not you could pre-populate a DistributedMapCache with the list of XSLT files, and use FetchDistributedMapCache after FetchFile to get the list into an attribute. If that's overkill, you could use ExecuteScript to do the same thing, which might be better because you could add the count as an attribute as well.

Once you have each XML flow file with an attribute containing a comma-separated list of XSLT file paths and a count variable, you can create a loop using UpdateAttribute to decrement the count and to pop the next filename from the list. Assuming your list attribute is called xslt.list and the count attribute is called xslt.count and your "current xslt file" attribute is xslt.current, your UpdateAttribute might be:

xslt.current = ${xslt.list:substringBefore(',')}
xslt.list = ${xslt.substringAfter(',')}
xslt.count = ${xslt.count:toNumber():minus(1)}

A RouteOnAttribute processor can be used as the loop condition, checking for

 ${xslt.count:toNumber():gt(0)}

sending those that pass to the TransformXml (using ${xslt.count} as the XSLT File property value) then the above UpdateAttribute, then back to RouteOnAttribute. Once the loop condition fails, you can route those to the downstream flow. The result should be that each file gets all Z transforms applied, then moves on.

Of course this is just a workaround. I've written NIFI-3667 to cover the improvement to TransformXml to allow multiple stylesheet files.

Re: Could TransformXML work with several xslts ?

Contributor

Legend! This is perfect. Even gone the extra step of submitting the improvement ticket! Can't upvote this enough

Don't have an account?
Coming from Hortonworks? Activate your account here