Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Morphline XSLT not working as described

avatar
Explorer

I'm having trouble getting XSLT working with Morphlines. I'm using the XSLT as per the documentation

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/xslt

but it does not seem to pass on the elements and atributes as per this description...

 

"...For each item in the query result sequence, the morphline command converts the item to a record and pipes that record to the next morphline command. For an attribute node the attribute's XPath string value is filled into the record field named after the attribute name. For an element node the attributes and children of the element are treated as follows: The XPath string value of the attribute or child is filled into the record field named after the child's name..."

 

Bellow is a simple xml file, my morphlines script and the output of running this in TRACE mode as per the standlone test program

https://github.com/cloudera/cdk/blob/master/cdk-morphlines/cdk-morphlines-core/src/test/java/com/clo...

 

The same problem also happens in my full Cloudera Search deployment.

 

When you look at "output record:" from the script logger, you can see that only the Child1 element is passed through, but none of its attriblutes, nor the sub-child Child1_1 element?

 

If the above description of how XSLT is processed within Morphlines is true then why aren't all elements and attributes being passed through?

 

cat test.xml
<RootNode>
    <Child1 A="a" B="B">
        <Child1_1 C="c"/>
    </Child1>
</RootNode>

 

cat /tmp/morphlog.txt
1479 [main] TRACE com.cloudera.cdk.morphline.saxon.XSLTBuilder$XSLT  - beforeProcess: {_attachment_body=[java.io.BufferedInputStream@5de3eba1]}
1529 [main] TRACE com.cloudera.cdk.morphline.saxon.XSLTBuilder$XSLT  - XSLT input document: <RootNode>
      <Child1 A="a" B="B">
            <Child1_1 C="c"/>
      </Child1>
</RootNode>

1553 [main] TRACE com.cloudera.cdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID  - beforeProcess: {Child1=[
        
    ]}
1556 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogInfoBuilder$LogInfo  - beforeProcess: {Child1=[
        
    ], id=[c7a9acc1-a82b-4fa1-b491-603d5f570401]}
1556 [main] INFO  com.cloudera.cdk.morphline.stdlib.LogInfoBuilder$LogInfo  - output record: [{Child1=[
        
    ], id=[c7a9acc1-a82b-4fa1-b491-603d5f570401]}]


cat morphlines.conf
morphlines : [
  {
    id : morphtest
    importCommands : ["com.cloudera.**"]
    commands : [
      {
        xslt {
          fragments : [
            {
              fragmentPath : "/"
queryString : """
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
"""
            }
          ]
        }
      }
      { generateUUID { field : id } }
      { logInfo { format : "output record: {}", args : ["@{}"] } }
    ]
  }
]

1 ACCEPTED SOLUTION

avatar
Super Collaborator
There's no need to change any input data. Rather you need to change the xsl transform to spit out whatever output is expected. The XSL transform needs to produce data that is shaped as expected by the rules that govern the conversion of saxon xslt output to morphline records, as described in the docs - http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/xslt

Wolfgang.

View solution in original post

7 REPLIES 7

avatar
Explorer

Since the sample XSLT was not working for me, I tried crafting a more customized XSLT to extract attribites as elements. While I have a little more success with this, it appears there is a bug in morphlines handling of the XSLT results. As you can see from this output, child element attributes are being merged together instead of being treated as distinct items (see red font text in log output below)?

 

At this point, XML processing appears to be broken in moprhlines 😞

 

cat morphlines.conf
morphlines : [
  {
    id : morphtest
    importCommands : ["com.cloudera.**"]
    commands : [
      {
        xslt {
          fragments : [
            {
              fragmentPath : "/"
        queryString : """
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" >
    <!-- <xsl:output method="xml" indent="yes" /> -->
    <xsl:template match="//*[name() != 'RootNode']">
        <xsl:element name="{name()}">
            <xsl:for-each select="@*">
                <xsl:element name="{name()}">
                    <xsl:value-of select="."/>
                </xsl:element>
            </xsl:for-each>
            <xsl:apply-templates select="*|text()"/>
        </xsl:element>
    </xsl:template>
</xsl:stylesheet>
        """
            }
          ]
        }
      }
      { generateUUID { field : id } }
      { logInfo { format : "output record: {}", args : ["@{}"] } }
    ]
  }
]


cat test2.xml
<?xml version="1.0"?>
<RootNode>
    <Child1 A="a" B="B">
        <Child1_1 C="c" D="d" E="e" />
        <Child1_2 X="x" B="bb" />
    </Child1>
</RootNode>

 

cat /tmp/morphlog.txt
1964 [main] TRACE com.cloudera.cdk.morphline.saxon.XSLTBuilder$XSLT  - XSLT input document:
<RootNode>
      <Child1 A="a" B="B">
            <Child1_1 C="c" D="d" E="e"/>
            <Child1_2 X="x" B="bb"/>
      </Child1>
</RootNode>


1981 [main] TRACE com.cloudera.cdk.morphline.stdlib.GenerateUUIDBuilder$GenerateUUID  - beforeProcess: {A=[a], B=[B], Child1_1=[cde], Child1_2=[xbb]}
1983 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogInfoBuilder$LogInfo  - beforeProcess: {A=[a], B=[B], Child1_1=[cde], Child1_2=[xbb], id=[28827c81-28ca-41b1-9411-b8697d765ac5]}
1984 [main] INFO  com.cloudera.cdk.morphline.stdlib.LogInfoBuilder$LogInfo  - output record: [{A=[a], B=[B], Child1_1=[cde], Child1_2=[xbb], id=[28827c81-28ca-41b1-9411-b8697d765ac5]}]

 

Notice that when I run the same xml and xsl via the standalone Saxon parser it correctly creates attributes as elements, so this leads me to think that once morphlines pipeline processes the Saxon output into a record, that this is where the merging of attributes problem is happening.

 

java -cp /opt/saxon/SaxonHE9-5-1-2/saxon9he.jar net.sf.saxon.Transform -s:test2.xml -xsl:test2.xsl
<?xml version="1.0" encoding="UTF-8"?>
    <Child1><A>a</A><B>B</B>
        <Child1_1><C>c</C><D>d</D><E>e</E></Child1_1>
        <Child1_2><X>x</X><B>bb</B></Child1_2>
    </Child1>

avatar
Super Collaborator
Again, the output of the xslt morphline command is not an XML document or element or attribute. Rather the output is XPath string values that are filled into record fields.

Wolfgang.

avatar
Explorer
Sorry but I did not see your previous response (never got an email notification?) and just noticed it now.

Seems like you are suggesting I alter the incoming data to force dummy data into elements that have only attributes but no element value is this correct?

If so this will be a problem. For example please see the FIXML spec which results in data as attributes not element values...

http://fixwiki.org/fixwiki/FPL:FIXML_Syntax#FIXML_4.4_Schema_Version

This means I would need an extra ETL step to prepare the data for morphlines 😞

avatar
Super Collaborator
There's no need to change any input data. Rather you need to change the xsl transform to spit out whatever output is expected. The XSL transform needs to produce data that is shaped as expected by the rules that govern the conversion of saxon xslt output to morphline records, as described in the docs - http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/xslt

Wolfgang.

avatar
Explorer
For my second XSLT example, I understand now what you mean regarding the XPath string result in the Child1_1 and Child1_2 elements as a concatenation of their child texts, but why do I not also see C=[c], D=[d], E=[e], X=[x], B=[bb] in the output record too?? The doc said: "The XPath string value of the attribute or child is filled into the record field named after the child's name." I take that "or child" means a child element and thus a child element's text would be associated with the child element's name and thus I should see fields like C=[c] in the output record??

avatar
Explorer

I've take a different approach and created an XSLT that extracts the attributes from each element in the hierarchy and transforms them into a single level collection of elements. This seems to be giving me the results I expect now.

avatar
Super Collaborator
Cool!