Reply
Highlighted
Expert Contributor
Posts: 162
Registered: ‎07-29-2013
Accepted Solution

morphilne config is partially executed without any error and a command in pipe is lost

Hi, I've met strange thing. Here is good config:

# Copyright 2013 Cloudera Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**"]
    
    commands : [                    
      { 
        xquery {
          fragments : [
            {
              fragmentPath : "/"
              queryString : "/tweets/tweet/@text" # each item in result sequence becomes a morphline record
            }
          ]
        }
      }

      { logDebug { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

here is it's partial output:

1008 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery  - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one"
1015 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug  - beforeProcess: {id=[123], text=[sample tweet one]}
1015 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug  - output record: [{id=[123], text=[sample tweet one]}]
38023 [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery  - XQuery result sequence item #2 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet two"
38025 [main] TRACE com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug  - beforeProcess: {id=[123], text=[sample tweet two]}
38025 [main] DEBUG com.cloudera.cdk.morphline.stdlib.LogDebugBuilder$LogDebug  - output record: [{id=[123], text=[sample tweet two]}]

 

 

 Here is "bad" config, only xquery command is executed:

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**"]

    commands : [
      {
        xquery {
          fragments : [
            {
              fragmentPath : "/"
              queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId/text()"
            }
          ]
        }
      }

      { logDebug { format : "output record: {}", args : ["@{}"] } }
    ]
  }
]

 I clearly see in logs that first config runs w/o any problems, this config is taken from SaxonMorphlineTest

The second config executes w/o any problems but logDebug is not working.

 

Here is output:

431  [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery  - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet
collector.getRecords()[]

 Here is what I see in Idea.

Why does idea "highlights" "xquery" in good config? I'm on linux, I don't see any "bad chars" in text editor.

Выделение_076.png

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: morphilne config is partially executed without any error and a command in pipe is lost

Based on the TRACE log your XQuery generates the following output:

?foo?

which isn?t meaningful for a morphline (e.g. into which morphline field should the ?foo? string be stored into and what if you want to return values for multiple fields or multiple records?) Thus, in your case the xquery command returns false and logDebug is never executed, hence you see no logDebug output.

Note that the output of an XQuery must be shaped to conform to the format described here: http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/xquery

Example XQuery output:


foo
bar


This will generate a morphline record with a myFoo field that contains ?foo", as well as a myBar field that contains ?bar".

Wolfgang.

Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Re: morphilne config is partially executed without any error and a command in pipe is lost

[ Edited ]

Hi, looks like it's a bug, your reply created new sperate forum thread.

I see the defference.

EXample from CDK tests return: text="sample tweet one"

sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyAttributeImpl with value: text="sample tweet one"

And mine doesn't return someName=someValue, only someValue:

431  [main] TRACE com.cloudera.cdk.morphline.saxon.XQueryBuilder$XQuery  - XQuery result sequence item #1 is of class: net.sf.saxon.tree.tiny.TinyTextImpl with value:someValueIWantToGet

Twitter-based example takes name "text" becuase it's attribute name from xml?

 

 

I've refactored my xq:

 queryString : """
                                    for $entry in /collectorEvent/attributes/etpEventCollectorAttributes
                                    return
                                     <entry>
                                      {$entry/ssoId}
                                      {$entry/applicationId}
                                    </entry>
                            """

 now it returns:

 

{applicationId=[123], ssoId=[someSSO_id}

 

Thanks, it works!

 

 

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.