Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hbase/solar indexing feature Lily HBase NRT Indexer Service

avatar
Explorer

Hi

 

I am  a complete newbie to Cloudera search. I went through the example described here: hbase indexer all seems to be good, the data field is being indexed successfully in apache solar. However when I add simple step like adding timestap field - see below, the indexer stops working, any idea as to why this happening? ndication to where I can find the indexers logs will be very much appreciated.

 

My ultimate goad is to store an xml in the 'data' filed and apply Xpath to index only the element I want its value to be indexed. Can anyone me give me a clue how to achieve this? are there any examples which uses either xpath or xquery?

 

kind regards,

 

akhettar

 

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]

commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:*"
outputField : "data"
type : string
source : value
}

#{
# inputColumn : "data:item"
# outputField : "_attachment_body"
# type : "byte[]"
# source : value
#}
]
}
}
{ addCurrentTime { field : ts } }


#for avro use with type : "byte[]" in extractHBaseCells mapping above
#{ readAvroContainer {} }
#{
# extractAvroPaths {
# paths : {
# data : /user_name
# }
# }
#}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

 

2 ACCEPTED SOLUTIONS

avatar
Super Collaborator

The xquery command expects a byte[] rather than a string as input, and that input must be in the outputField : “_attachment_body” field rather than the "data" field. Try changing the extractHBaseCells command to use type : "byte[]” and outputField : “_attachment_body"

 

Also you need to change your xquery command to wrap your XML output into yet another XML element (e.g. “record”).

 

For example, in order to generate a morphline record with a “myFoo" field that contains “foo",

as well as a “myBar" field that contains “bar", your xquery command should be formulated such

that it outputs an XML fragment like this:

 

<record>

<myFoo>foo</myFoo>

<myBar>bar</myBar>

</record>

 

Wolfgang.

View solution in original post

avatar
Super Collaborator

The "if" command and "equals" command and indeed all morphline commands know nothing about hbase colunmns or hbase qualifiers, except for the extractHBaseCells command. Use extractHBaseCells to extract whatever hbase columns you want into whatever morphline record fields you want, then subsequently use "if", "equals" or similar to act on the morphline record fields (not on hbase columns or qualifiers direcly).

View solution in original post

17 REPLIES 17

avatar
Explorer

Many thanks for your help.

 

Ayache

avatar
Explorer

Hi

 

Just one last question regarding the above implementation. How do I make the xquery step only get triggered for certain column 'family name'.  Here is the scenario

 

put 'record', 'row1', 'data', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'

 

The above input into hbase will be handled by xquery step.

 

put 'record', 'row2', 'context', 'business'  ===> for this input I don't want to call the xquery strep, rather the following mapping will suffice (extracting cell value

 

{
inputColumn : "context"
outputField : "context"
type : string
source : value
}

 

I thought about declaring two morphlines, one handling fields that require xquery step and others just one to oine mapping.

 

I tried something like this:

 

morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "com.ngdata.**"]

commands : [


{
extractHBaseCells {
mappings : [
{
inputColumn : "data"
outputField : "_attachment_body"
type : "byte[]"
source : value
}
]
}
}
{
xquery {
fragments : [
{
fragmentPath : "/"

queryString : """

(: All namespace declarations go here 🙂

declare namespace inps = "http://inps.co.uk/";

(: Extracting all the fieleds that need indexing 🙂


let $name := /employees/employee/name/text()

(: Returning the list of the fields that needs to be indexed. These fields are defined in solar schema.xml file. 🙂

return
<fieldsToIndex>
<name>{$name}</name>

</fieldsToIndex>
"""
}
]
}
}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
{
id : morphline2
importCommands : ["org.kitesdk.**", "com.ngdata.**"]

commands : [

]{
extractHBaseCells {
mappings : [

{
inputColumn : "context"
outputField : "context"
type : string
source : value
}
]
}
}

{ logTrace { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

Looks like the second morphline is ignored. First, is this a sensible solution? If so, how do I make the second morphline known.

 

kind regards,

 

Ayache

 

avatar
Super Collaborator

You can express it all in a single morphline. Consider using if-then-else command or the tryRules command or similar in order to check which case applies and execute whatever corresponding logic is appropriate for that case. You can have multiple extractHBaseCells commands in a single morphline, e.g. one in each branch of the tryRules command.

 

Wolfgang.

avatar
Explorer

Thanks again Wolfgang for your prompet respon, this is very helpful. I've looked at teh if/els and tryRules. Both use the 'contains' command for conditions. The contain however only matches the value of the field, my use case is about asserting a presence of the fied regardless of the value. So if the filed  is 'context' I want to apply xquery step if not, just one to one mapping from hbase cell. Is there a command to check for the presence of a field?

 

Thanks

avatar
Super Collaborator

Try equals { id : [] } for example as shown here: http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#if

 

In a morphline record there is no difference between a field with zero values and a field that doesn't exist. 

avatar
Explorer

Hi

 

I tried equals earlier, but I am afraid it's not working form me. Here is snippet for morphline:

 

if {
conditions : [
{ equals { p : [] } }
]

then : [

# handling headers

{logInfo { format : "processing headers..." } }

 

]

else : [

  # handling payload

 

{logInfo { format : "processing payload..." } }

 

]

 

So with the follwing entry into hbase: 

 

put 'payload', 'row1', 'p', '<employees><employee><name>ayache</name><age>29</age></employee></employees>'

 

still going into the 'handling headers' close.

 

My use case mandate that the field 'p' is qualified with qualifier 'in': So not usre how to reference qualifier in the 'equals' command. I've tried this equals { "p:in" : [] } . This is the next step really, It's not matching withouth the qualifier anyway, any idea what I have missed?

 

Regards,

 

Ayache

avatar
Super Collaborator

The "if" command and "equals" command and indeed all morphline commands know nothing about hbase colunmns or hbase qualifiers, except for the extractHBaseCells command. Use extractHBaseCells to extract whatever hbase columns you want into whatever morphline record fields you want, then subsequently use "if", "equals" or similar to act on the morphline record fields (not on hbase columns or qualifiers direcly).

avatar
Explorer

Got you, all working perfectly now. Thank you so much for your help, I now understand more about how morphline / hbase-indexer work.