Reply
Highlighted
Expert Contributor
Posts: 162
Registered: ‎07-29-2013
Accepted Solution

morphlines, kite, solr and flume

Hi, I want to catch xml payload using flume and use morphlines to put parsed data to solr.

Now I have a deep misunderstanding. What do I have to use cdk-morphlines or kite?

I have a config:

 

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**"]

    commands : [
      {
        xquery {
          fragments : [
            {
              fragmentPath : "/"
              queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId"
            }
          ]
        }
      }

      { logDebug { format : "output record: {}", args : ["@{}"] } }
    ]
  }
]

 it runs for cdk and doesn't work for kite environment. I do get an exception while trying to run test:

 

org.kitesdk.morphline.api.MorphlineCompilationException: No command builder registered for name: xquery near: {
# target/test-classes/morphlines/dummy-xml.conf: 8
"xquery" : {
# target/test-classes/morphlines/dummy-xml.conf: 9
"fragments" : [
# target/test-classes/morphlines/dummy-xml.conf: 10
{
# target/test-classes/morphlines/dummy-xml.conf: 12
"queryString" : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId",
# target/test-classes/morphlines/dummy-xml.conf: 11
"fragmentPath" : "/"
}
]
}
}

 

why? and what would work with flume?

Cloudera Employee
Posts: 145
Registered: ‎08-21-2013

Re: morphlines, kite, solr and flume

This info is in section "version 0.10.0" at http://kitesdk.org/docs/current/release_notes.html

Wolfgang.

Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Re: morphlines, kite, solr and flume

[ Edited ]

Hi!

I've passed this guide:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v1-latest/Cloudera-Search-...

 

I've spent some time to make it work from Cloudera Manager nd it really works.

What confuses me:

Here are imports from tutorial

 # Import all morphline commands in these java packages and their subpackages.
    # Other commands that may be present on the classpath are not visible to this morphline.
    importCommands : ["com.cloudera.**", "org.apache.solr.**"]

 

and kite has diffrent configuration in it's examples...It looks even more complicated than CDK example-tutorial.

Cloudera Employee
Posts: 145
Registered: ‎08-21-2013

Re: morphlines, kite, solr and flume

CDH 4.x uses CDK whereas CDH 5.x uses Kite. The diff is just in the package names.

Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Re: morphlines, kite, solr and flume

Ok, so if I migrate to CDH5 I have to refactor my morpflines.conf?

 

 This config works for cdk

 

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.**"]

    commands : [
      {
        xquery {
          fragments : [
            {
              fragmentPath : "/"
              queryString : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId"
            }
          ]
        }
      }

      { logDebug { format : "output record: {}", args : ["@{}"] } }
    ]
  }
]

 and fails for kite with exception:

 

org.kitesdk.morphline.api.MorphlineCompilationException: No command builder registered for name: xquery near: {
    # target/test-classes/morphlines/dummy-xml.conf: 8
    "xquery" : {
        # target/test-classes/morphlines/dummy-xml.conf: 9
        "fragments" : [
            # target/test-classes/morphlines/dummy-xml.conf: 10
            {
                # target/test-classes/morphlines/dummy-xml.conf: 12
                "queryString" : "/collectorEvent/attributes/etpEventCollectorAttributes/ssoId",
                # target/test-classes/morphlines/dummy-xml.conf: 11
                "fragmentPath" : "/"
            }
        ]
    }
}

 here is my kite-based test:

 

import org.junit.Test
import org.kitesdk.morphline.api.AbstractMorphlineTest
import org.kitesdk.morphline.api.Record
import org.kitesdk.morphline.base.Fields

/**
 * User: sergey.sheypak
 * Date: 13.10.14
 * Time: 20:30
 */
class ParseDummyXmlTest extends AbstractMorphlineTest {

    @Test
    void testParseDummyXml(){
        morphline = createMorphline('morphlines/dummy-xml');
        def record = new Record()
        record.put(Fields.ATTACHMENT_BODY, readDummyXml());
        processAndVerifySuccess(record, null);
    }


    InputStream readDummyXml(){
        this.class.classLoader.getResourceAsStream('dummy.xml')
    }

    private void processAndVerifySuccess(Record input, Record expected) {
        collector.reset();
        startSession();
        morphline.process(input)
        collector.getFirstRecord()
    }
}

 my kite dependencies are:

        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-all</artifactId>
            <version>0.17.0</version>
            <type>pom</type>
        </dependency>
        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-core</artifactId>
            <type>test-jar</type>
            <scope>test</scope>
            <version>0.17.0</version>
        </dependency>
        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-saxon</artifactId>
            <version>0.17.0</version>
        </dependency>

 

Cloudera Employee
Posts: 145
Registered: ‎08-21-2013

Re: morphlines, kite, solr and flume

Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Re: morphlines, kite, solr and flume

Thanks for your patience, 

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.