Created on 11-04-2014 12:19 AM - edited 09-16-2022 02:11 AM
Hi
I am a complete newbie to Cloudera search. I went through the example described here: hbase indexer all seems to be good, the data field is being indexed successfully in apache solar. However when I add simple step like adding timestap field - see below, the indexer stops working, any idea as to why this happening? ndication to where I can find the indexers logs will be very much appreciated.
My ultimate goad is to store an xml in the 'data' filed and apply Xpath to index only the element I want its value to be indexed. Can anyone me give me a clue how to achieve this? are there any examples which uses either xpath or xquery?
kind regards,
akhettar
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]
commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:*"
outputField : "data"
type : string
source : value
}
#{
# inputColumn : "data:item"
# outputField : "_attachment_body"
# type : "byte[]"
# source : value
#}
]
}
}
{ addCurrentTime { field : ts } }
#for avro use with type : "byte[]" in extractHBaseCells mapping above
#{ readAvroContainer {} }
#{
# extractAvroPaths {
# paths : {
# data : /user_name
# }
# }
#}
{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]
Created 11-05-2014 10:42 AM
The xquery command expects a byte[] rather than a string as input, and that input must be in the outputField : “_attachment_body” field rather than the "data" field. Try changing the extractHBaseCells command to use type : "byte[]” and outputField : “_attachment_body"
Also you need to change your xquery command to wrap your XML output into yet another XML element (e.g. “record”).
For example, in order to generate a morphline record with a “myFoo" field that contains “foo",
as well as a “myBar" field that contains “bar", your xquery command should be formulated such
that it outputs an XML fragment like this:
<record>
<myFoo>foo</myFoo>
<myBar>bar</myBar>
</record>
Wolfgang.
Created 11-10-2014 07:10 AM
The "if" command and "equals" command and indeed all morphline commands know nothing about hbase colunmns or hbase qualifiers, except for the extractHBaseCells command. Use extractHBaseCells to extract whatever hbase columns you want into whatever morphline record fields you want, then subsequently use "if", "equals" or similar to act on the morphline record fields (not on hbase columns or qualifiers direcly).
Created 11-04-2014 12:49 AM
Created 11-04-2014 04:59 AM
Many thanks Wolfgang for your input. As for the logs, I am not running Cloudera manager, the VM requires at least 8 GB or RAM and it's freezing my machine. I was hoping to be pointed to log file in the file system so I could tail?
Regards,
Ayache
Created 11-04-2014 12:33 PM
Created 11-04-2014 02:02 PM
Hi
Thanks for the hint. /var/log/solr/solr.out is for solar server, but was interesed in hbase-indexer server. Here is the link for the benefit of all: tail -f /var/log/hbase-solr/hbase-solr-hbase.out
Created 11-05-2014 01:39 AM
Hi
I got the above morphline conf working...the 'ts' varialbe wasn't declared in the schema.xml. I have experimented with some other commands, however the xquery command isn't working properly.
Here is the morophline conf:
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]
commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:*"
outputField : "data"
type : string
source : value
}
#{
# inputColumn : "data:item"
# outputField : "_attachment_body"
# type : "byte[]"
# source : value
#}
]
}
}
{ xquery {
fragments : [
{
fragmentPath : "/"
externalVariables : {
myVariable : "hello world"
}
queryString : """
declare variable $myVariable as xs:string external;
(: Example xquery 🙂
let $name := /employees/employee/name/text()
return
<ts> { $name } </ts>
"""
}
]
}
}
#for avro use with type : "byte[]" in extractHBaseCells mapping above
#{ readAvroContainer {} }
#{
# extractAvroPaths {
# paths : {
# data : /user_name
# }
# }
#}
{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]
morphline-hbase-mapper.xml
<?xml version="1.0"?>
<indexer table="record" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">
<!-- The relative or absolute path on the local file system to the morphline configuration file. -->
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager -->
<param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/>
<!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf -->
<!-- <param name="morphlineId" value="morphline1"/> -->
</indexer>
The input to hbase is: put 'record', 'row12', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'
All I can see in the log is the following:
14/11/05 07:29:22 WARN morphline.LocalMorphlineResultToSolrMapper: Morphline /etc/hbase-solr/conf/morphlines.conf@null failed to process record: {data=[<employees><employee><name>ayache</name><age>29</age></employee></employees>]}
Any idea as to why it's failing to index the above input?
Many thanks
Ayache
Created 11-05-2014 10:00 AM
Hi
I've debugged this further but unfortunately got no where. What version of kite-sdk is the QuickStartVM is running if I may ask?
I've moved away from using xquery and started to expirement with xslt instead. Running the example shown in the morpheline reference guide - see below my morphline sample
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "com.ngdata.**"]
commands : [
{
xslt {
fragments : [
{
fragmentPath : "/"
parameters : {
myVariable : "hello world"
}
queryString : """
<!-- Example XSLT identity transformation -->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
"""
}
]
}
}
{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]
If push the following into hbase: put 'record', 'row41', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'
I would expect an index to be created with the same xml ingested into hbase. I can see in the logs that the import is done - see below. But when I search on Solar web console I can't see any entry.
Can anyone help me at least to run one of the xquery or xslt examples?
Thanks
Ayache
Created 11-05-2014 10:42 AM
The xquery command expects a byte[] rather than a string as input, and that input must be in the outputField : “_attachment_body” field rather than the "data" field. Try changing the extractHBaseCells command to use type : "byte[]” and outputField : “_attachment_body"
Also you need to change your xquery command to wrap your XML output into yet another XML element (e.g. “record”).
For example, in order to generate a morphline record with a “myFoo" field that contains “foo",
as well as a “myBar" field that contains “bar", your xquery command should be formulated such
that it outputs an XML fragment like this:
<record>
<myFoo>foo</myFoo>
<myBar>bar</myBar>
</record>
Wolfgang.
Created 11-06-2014 03:08 AM
Many thanks Wolfgang for the suggestion, they really did the trick. For future references are these mentioned anywere in the reference guide?
Created 11-06-2014 03:39 AM
It’s mentioned in the ref guide for the next upcoming kite version per https://github.com/kite-sdk/kite/blob/master/kite-morphlines/src/site/confluence/morphlinesReference...