Reply
Highlighted
Champion Alumni
Posts: 161
Registered: ‎02-11-2014
Accepted Solution

Extract hbase cell command reference guide

Hi,

I am trying to get more information on extracHbaseCells command.How ever I am unable to find it in the morphline reference guide.Can some one please  let me know where I can find the documentation on this.The following is the refernce guide Iam looking at.

 

http://cloudera.github.io/cdk/docs/0.9.1/cdk-morphlines/morphlinesReferenceGuide.html

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Extract hbase cell command reference guide

See Sections "Creating a Morphline Configuration File? and "Understanding the extractHBaseCells morphline command? at http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Gu...

Wolfgang.

Champion Alumni
Posts: 161
Registered: ‎02-11-2014

Re: Extract hbase cell command reference guide

Hi ,

 

I understood that part.But let us say i extract an xml from   the hbase cell with following elements(name,city,country) and I want to index the  solr .My solr schema also has  fields (name,city and country).Now I need to parse  the xml ,ge these fileds and  index it to solr.

 

 

extractHBaseCells {
mappings : [
{
inputColumn : "messages:name"
outputField : "name"
type : String
source : value
}

 

{
inputColumn : "messages:city"
outputField : "city" 
type : String
source : value
}

 

{
inputColumn : "messages:country"
outputField : "country" 
type : String
source : value
}
]
}

 

This would have been possible If was able to retrieve data from hbase in this format.But what  extractHbaseCell would give me is an xml file.I am loking for a way  to parse this using xquery and then assign then assign values to solr field.

 

 

 

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Extract hbase cell command reference guide

You can just specify an extractHBaseCells command followed by an xquery command in the same morphline config file. Each command pipes into the subsequent command, and you can specify as many commands as you like. The links I mentioned contain a (commented out) example for extractHBaseCells followed by readAvroContainer. just uncomment that and replace readAvroContainer with xquery.

Wolfgang.

Champion Alumni
Posts: 161
Registered: ‎02-11-2014

Re: Extract hbase cell command reference guide

Thank you.If I am not wrong can I can also put my java code to parse the  the soap message here.I am a java guy,so it seems to be easy for me.

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Extract hbase cell command reference guide

Yes, you can write a custom morphline command in Java [1] and add the corresponding custom jar that to the classpath, e.g via the HBASE_INDEXER_CLASSPATH environment variable in menu ?Service-Wide/Advanced/Safety Valve? in Cloudera Manager (for Near Real Time Indexing) or via the --libjars CLI option on HBaseMapReduceIndexerTool (for Batch Indexing).

Alternatively, you also write a mini script in Java and paste it into the body of the ?java" morphline command [2].

[1] Section "Implementing your own Custom Command? at http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html

[2] http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/java

Champion Alumni
Posts: 161
Registered: ‎02-11-2014

Re: Extract hbase cell command reference guide

Hi,

I was able to  parse the xml from   stored in hbase and then put all the values into the record object.How do i set this to the required solr fields now ?.

 

My conf file is right npow like this

 

{extract habse}

{java # have all values extracted and set to the record object)

 

Now how can I set these extracted values to solr fields?..

 

 

Thanks,

Nishanth

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Extract hbase cell command reference guide

A next step is to configure Solr, in particular schema.xml and solrconfig.xml. For an example see http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Gu...

Champion Alumni
Posts: 161
Registered: ‎02-11-2014

Re: Extract hbase cell command reference guide

[ Edited ]

Thanks a lot.I have created the solr cloud  and was able to index a sample data(extract the mesagae and put it into one solr field) just to check that my configuration is correct and it works.

 

How ever when I try to extract  data and assign it to solr schema elements it does not work.Extract hbase cell looks like this.Do I need to have _attachment_body" field or an "_attachment_mimetype" field defined in my schema?

 

 

extractHBaseCells {
mappings : [
{
inputColumn : "messages:*"
outputField : "_attachment_body"
type : byte[]
source : value
}
]
}
}

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.