Support Questions

Find answers, ask questions, and share your expertise

NiFi: Extract atrribute value from XML using EvaluateXPath

avatar
Expert Contributor

I created a workflow in NiFi 1.5.0 that reads a XML file from HDFS. After splitting the file into separate <Transaction> elements, I want to read out an attribute's value and afterwards react by this value.

My original XML looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Log>
  <Transaction Type="1" TrainingModeFlag="true">
    <StoreID>240041</StoreID>
     ...
  </Transaction>
</Log>

My Workflow splits the XML via SplitXML in depth 1, so after this processor I have this sub-xml:

<?xml version="1.0" encoding="UTF-8"?>
<Transaction Type="1" TrainingModeFlag="true">
  <StoreID>240041</StoreID>
   ...
</Transaction>

I want to extract the value of the Type attribute of the Transaction tag, but it doesn't work for me. Here my EvaluateXPath processor:

72493-1.png

My new content attribute has an empty string set, instead of showing the value 1:


72494-2.png

Using the XPath //@Type works, but I need the exact path, as the Type attribute can occur in sub-elements.

Can someone help?

7 REPLIES 7

avatar
Master Guru

I can't reproduce this, I used GenerateFlowFile with your input XML (adding two Transactions) -> SplitXML (level 1) and got the same "sub-xml" you did, then I used the same settings for EvaluateXPath and my content attribute has the correct value of 1. The only way I got it to show "Empty string set" is when I used /Transaction/@type as the XPath (note the wrong case for Type/type), is it possible there's a typo or case-sensitivity issue between your input XML and the XPath?

avatar
Expert Contributor

Thanks for your fast answer! I checked the settings and there are definitivly no upper/lower case problems. I just saw that the NiFi version is 1.4, not 1.5. Is there a problem with this processor?

avatar
Expert Contributor

I also tried to "generate" the XML by the GenerateFlowFile processor, but still the same problem (thought it has something to do with my read XML maybe, but seems not to be so)

avatar
Master Guru

Maybe try without the string() function around it?! I'm not sure, since I used the transform above and it worked...

avatar
Expert Contributor

Here my workflow so far:

72482-3.png

avatar
New Contributor

Same problem here. Valid XPath expressions produce empty strings in EvaluateXPath processor. In the attached screenshots only //@categoryId and //@UniqueType seem to work. I am using nifi 1.7.1 and jdk1.8.0_181.jdk. Any insights would be appreciated!

88457-examplexmlfile.png

88458-evaluatexpath-properties.png

88459-flowfileattributes.png

avatar
New Contributor

Thanks to the second OP for identifying the root cause in the NiFi Jira.

For people researching this today, the cause was the implicit/default Namespace specified in the root node (the 'xmlns' referenced in that element but without a suffix).  In the case of the second poster, their XML started with:

 

<data xmlns="http://www.media-saturn.com/msx" xmlns: ...

 

The `/data/item//uniqueID` he was searching for belongs to, more accurately, the "http://www.media-saturn.com/msx" namespace, meaning that - he was supposed to - specify that namespace as part of his XPath expression.

The reason that searching for the pathless "//@uniqueType" worked, was because that search searches all namespaces for that XPath expression!

 

I'm using NiFi 2.0.0 M4 today and I'm pleased to report that it appears to support the XPath 3.0/3.1 notation where the Namespace can be specified inline with the query.  It's not particularly elegant - but it works.  You prefix the Namespace with the capital Letter 'Q' and wrap it in curly brackets; namely: 
     Q{http://www.media-saturn.com/msx}<single-level selector>

To implement his expression, "/data/item//uniqueID[UniqueType='ProdID']/text()" which currently returns an Empty String set for Key 'ProdID4', you would use:

 

/Q{http://www.media-saturn.com/msx}data/Q{http://www.media-saturn.com/msx}item//uniqueID[UniqueType='ProdID']/text()

 

 I have a suspicion that the second Namespace reference (to 'item' in this case) is not required, since once you've selected/are navigating down the 'data' path of the correct Namespace, you're not likely to jump to another Namespace?  My research indicates that Attributes do not seem to accept Namespace referencing - but again, once you've successfully selected your path I suspect it becomes a moot point.

Aside,
[1] it would be nice if the NiFi documentation specified the version of the XPath implemented within the Processor.
[2] Even better if there were a drop down within the Processor that allowed a developer to select the version of XPath expression desired.