Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hbase/solar indexing feature Lily HBase NRT Indexer Service

Solved Go to solution
Highlighted

hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Hi

 

I am  a complete newbie to Cloudera search. I went through the example described here: hbase indexer all seems to be good, the data field is being indexed successfully in apache solar. However when I add simple step like adding timestap field - see below, the indexer stops working, any idea as to why this happening? ndication to where I can find the indexers logs will be very much appreciated.

 

My ultimate goad is to store an xml in the 'data' filed and apply Xpath to index only the element I want its value to be indexed. Can anyone me give me a clue how to achieve this? are there any examples which uses either xpath or xquery?

 

kind regards,

 

akhettar

 

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]

commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:*"
outputField : "data"
type : string
source : value
}

#{
# inputColumn : "data:item"
# outputField : "_attachment_body"
# type : "byte[]"
# source : value
#}
]
}
}
{ addCurrentTime { field : ts } }


#for avro use with type : "byte[]" in extractHBaseCells mapping above
#{ readAvroContainer {} }
#{
# extractAvroPaths {
# paths : {
# data : /user_name
# }
# }
#}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor

The xquery command expects a byte[] rather than a string as input, and that input must be in the outputField : “_attachment_body” field rather than the "data" field. Try changing the extractHBaseCells command to use type : "byte[]” and outputField : “_attachment_body"

 

Also you need to change your xquery command to wrap your XML output into yet another XML element (e.g. “record”).

 

For example, in order to generate a morphline record with a “myFoo" field that contains “foo",

as well as a “myBar" field that contains “bar", your xquery command should be formulated such

that it outputs an XML fragment like this:

 

<record>

<myFoo>foo</myFoo>

<myBar>bar</myBar>

</record>

 

Wolfgang.

View solution in original post

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor

The "if" command and "equals" command and indeed all morphline commands know nothing about hbase colunmns or hbase qualifiers, except for the extractHBaseCells command. Use extractHBaseCells to extract whatever hbase columns you want into whatever morphline record fields you want, then subsequently use "if", "equals" or similar to act on the morphline record fields (not on hbase columns or qualifiers direcly).

View solution in original post

17 REPLIES 17
Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor
The solr schema.xml config file needs to conform to the documents that you are trying to insert. Try adjusting schema.xml accordingly and tell solr about it via the solrctl CLI.

Also see http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#sanitizeUnknownSolrFie...

XPath and XQuery docs are here: http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#xquery

The log files of the Solr server and MapReduce tasks, etc can be displayed in the Cloudera Manager GUI.

Wolfgang.

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Many thanks Wolfgang for your input. As for the logs, I am not running Cloudera manager, the VM requires at least 8 GB or RAM and it's freezing my machine. I was hoping to be pointed to log file in the file system so I could tail?

 

Regards,

 

Ayache

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor
Try /var/log/solr

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Hi

 

Thanks for the hint.  /var/log/solr/solr.out is for solar server,  but was interesed in hbase-indexer server. Here is the link for the benefit of all: tail -f /var/log/hbase-solr/hbase-solr-hbase.out

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Hi

 

I got the above morphline conf working...the 'ts' varialbe wasn't declared in the schema.xml. I have experimented with some other commands, however the xquery command isn't working properly. 

 

Here is the morophline conf:

 

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]

commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data&colon;*"
outputField : "data"
type : string
source : value
}

#{
# inputColumn : "data&colon;item"
# outputField : "_attachment_body"
# type : "byte[]"
# source : value
#}
]
}
}

{ xquery {
fragments : [
{
fragmentPath : "/"
externalVariables : {
myVariable : "hello world"
}
queryString : """
declare variable $myVariable as xs:string external;
(: Example xquery :)
let $name := /employees/employee/name/text()
return
<ts> { $name } </ts>
"""
}
]
}
}


#for avro use with type : "byte[]" in extractHBaseCells mapping above
#{ readAvroContainer {} }
#{
# extractAvroPaths {
# paths : {
# data &colon; /user_name
# }
# }
#}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

 

morphline-hbase-mapper.xml

 

<?xml version="1.0"?>
<indexer table="record" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

<!-- The relative or absolute path on the local file system to the morphline configuration file. -->
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager -->
<param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/>

<!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf -->
<!-- <param name="morphlineId" value="morphline1"/> -->

</indexer>

 

 

The input to hbase is: put 'record', 'row12', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'

All I can see in the log is the following:

 

14/11/05 07:29:22 WARN morphline.LocalMorphlineResultToSolrMapper: Morphline /etc/hbase-solr/conf/morphlines.conf@null failed to process record: {data=[<employees><employee><name>ayache</name><age>29</age></employee></employees>]}

 

Any idea as to why it's failing to index the above input?

 

Many thanks

 

Ayache

 

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Hi

 

I've debugged this further but unfortunately got no where. What version of kite-sdk is the QuickStartVM is running if I may ask?

 

I've moved away from using xquery and started to expirement with xslt instead. Running the example shown in the morpheline reference guide - see below my morphline sample

 

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "com.ngdata.**"]

commands : [

{
xslt {
fragments : [
{
fragmentPath : "/"
parameters : {
myVariable : "hello world"
}
queryString : """
<!-- Example XSLT identity transformation -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>
"""
}
]
}
}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

If push the following into hbase: put 'record', 'row41', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'

 

I would expect an index to be created with the same xml ingested into hbase. I can see in the logs that the import is done  - see below. But when I search on Solar web console I can't see any entry.

 

Can anyone help me at least to run one of the xquery or xslt examples?

 

Thanks

 

Ayache

 

 

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor

The xquery command expects a byte[] rather than a string as input, and that input must be in the outputField : “_attachment_body” field rather than the "data" field. Try changing the extractHBaseCells command to use type : "byte[]” and outputField : “_attachment_body"

 

Also you need to change your xquery command to wrap your XML output into yet another XML element (e.g. “record”).

 

For example, in order to generate a morphline record with a “myFoo" field that contains “foo",

as well as a “myBar" field that contains “bar", your xquery command should be formulated such

that it outputs an XML fragment like this:

 

<record>

<myFoo>foo</myFoo>

<myBar>bar</myBar>

</record>

 

Wolfgang.

View solution in original post

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Explorer

Many thanks Wolfgang for the suggestion, they really did the trick. For future references are these mentioned anywere in the reference guide?

Highlighted

Re: hbase/solar indexing feature Lily HBase NRT Indexer Service

Expert Contributor

It’s mentioned in the ref guide for the next upcoming kite version per https://github.com/kite-sdk/kite/blob/master/kite-morphlines/src/site/confluence/morphlinesReference...

Don't have an account?
Coming from Hortonworks? Activate your account here