Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Extract hbase cell command reference guide

avatar
Champion Alumni

Hi,

I am trying to get more information on extracHbaseCells command.How ever I am unable to find it in the morphline reference guide.Can some one please  let me know where I can find the documentation on this.The following is the refernce guide Iam looking at.

 

http://cloudera.github.io/cdk/docs/0.9.1/cdk-morphlines/morphlinesReferenceGuide.html

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator
You can just specify an extractHBaseCells command followed by an xquery command in the same morphline config file. Each command pipes into the subsequent command, and you can specify as many commands as you like. The links I mentioned contain a (commented out) example for extractHBaseCells followed by readAvroContainer. just uncomment that and replace readAvroContainer with xquery.

Wolfgang.

View solution in original post

8 REPLIES 8

avatar
Super Collaborator
See Sections "Creating a Morphline Configuration File? and "Understanding the extractHBaseCells morphline command? at http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Gu...

Wolfgang.

avatar
Champion Alumni

Hi ,

 

I understood that part.But let us say i extract an xml from   the hbase cell with following elements(name,city,country) and I want to index the  solr .My solr schema also has  fields (name,city and country).Now I need to parse  the xml ,ge these fileds and  index it to solr.

 

 

extractHBaseCells {
mappings : [
{
inputColumn : "messages:name"
outputField : "name"
type : String
source : value
}

 

{
inputColumn : "messages:city"
outputField : "city" 
type : String
source : value
}

 

{
inputColumn : "messages:country"
outputField : "country" 
type : String
source : value
}
]
}

 

This would have been possible If was able to retrieve data from hbase in this format.But what  extractHbaseCell would give me is an xml file.I am loking for a way  to parse this using xquery and then assign then assign values to solr field.

 

 

 

 

avatar
Super Collaborator
You can just specify an extractHBaseCells command followed by an xquery command in the same morphline config file. Each command pipes into the subsequent command, and you can specify as many commands as you like. The links I mentioned contain a (commented out) example for extractHBaseCells followed by readAvroContainer. just uncomment that and replace readAvroContainer with xquery.

Wolfgang.

avatar
Champion Alumni

Thank you.If I am not wrong can I can also put my java code to parse the  the soap message here.I am a java guy,so it seems to be easy for me.

avatar
Super Collaborator
Yes, you can write a custom morphline command in Java [1] and add the corresponding custom jar that to the classpath, e.g via the HBASE_INDEXER_CLASSPATH environment variable in menu ?Service-Wide/Advanced/Safety Valve? in Cloudera Manager (for Near Real Time Indexing) or via the --libjars CLI option on HBaseMapReduceIndexerTool (for Batch Indexing).

Alternatively, you also write a mini script in Java and paste it into the body of the ?java" morphline command [2].

[1] Section "Implementing your own Custom Command? at http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html

[2] http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/java

avatar
Champion Alumni

Hi,

I was able to  parse the xml from   stored in hbase and then put all the values into the record object.How do i set this to the required solr fields now ?.

 

My conf file is right npow like this

 

{extract habse}

{java # have all values extracted and set to the record object)

 

Now how can I set these extracted values to solr fields?..

 

 

Thanks,

Nishanth

avatar
Super Collaborator
A next step is to configure Solr, in particular schema.xml and solrconfig.xml. For an example see http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Gu...

avatar
Champion Alumni

Thanks a lot.I have created the solr cloud  and was able to index a sample data(extract the mesagae and put it into one solr field) just to check that my configuration is correct and it works.

 

How ever when I try to extract  data and assign it to solr schema elements it does not work.Extract hbase cell looks like this.Do I need to have _attachment_body" field or an "_attachment_mimetype" field defined in my schema?

 

 

extractHBaseCells {
mappings : [
{
inputColumn : "messages:*"
outputField : "_attachment_body"
type : byte[]
source : value
}
]
}
}