Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error with Evaluate XPath in v. 1.7

avatar
New Contributor

I have a small XML file which I am extracting XPath values to attributes from in EvaluateXPath 1.7.0. The XML is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<sports-content path-id="general/l.sportsml.com.general/advisory/xt.pa.json.20180730123447-heartbeat">
  <sports-metadata doc-id="xt.pa.json.20180730123447-heartbeat" date-time="2018-07-30T12:34:47+00:00" language="en-US" document-class="advisory" fixture-key="heartbeat" fixture-name="Heartbeat">
    <sports-title/>
    <sports-content-codes>
      <sports-content-code code-type="publisher" code-key="padatasports.com" code-name="The Press Association"/>
      <sports-content-code code-type="distributor" code-key="xmlteam.com" code-name="XML Team Solutions, Inc."/>
      <sports-content-code code-type="sport" code-key="15000000" code-name="General"/>
      <sports-content-code code-type="league" code-key="l.sportsml.com.general" code-source="xmlteam.com" code-name="General"/>
    </sports-content-codes>
  </sports-metadata>
</sports-content>

The 3 attributes' XPath expressions are: string(/sports-content/sports-metadata/@date-time), string(/sports-content/sports-metadata/@doc-id), string(/sports-content/@path-id)

The attributes are extracted properly but I am getting

javax.xml.xpath.XPathExpressionException: Failure converting a node of class javax.xml.transform.sax.SAXSource: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog

Thinking that there might indeed be unseen characters at the beginning or end of the file I looked at it using a hex viewer and found a control character at the end. I added in a ReplaceText node with

(?s)^[^\<]*(<.*</sports-content>).*$

being replaced by $1

This worked to remove the control character but it did not get rid of the error message. As I said, I am able to get the attribute values I need, but I do not want to have my log filled with error messages, one for each flowfile.

1 REPLY 1

avatar
New Contributor

I failed to make the ReplaceText "code". It should be:

(?s)^[^\<](<.</sports-content>).*$