Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Configure the Lily HBase indexer to index certain part of a cell value following a customized rule

New Contributor

Hi,

I'm wondering if we can configure the Lily HBase indexer to index certain part of a cell value following a customized rule?

We have a table which contains only one column. And that table is used to store a well-defined JSON file (Byte[]). Now we are tryging to use 2nd indexing tools (We are considering Lily HBase indexer) to index the content of certain fields within that JSON. For example there is a field 'data.text' which contains freetext of a artical, and another field 'data.feedback' contains freetext of feedback. We want to configure the Lily HBase Indexer so that when a new record (JSON file) being added to that table, index of 'data.text' (string) and 'data.feedback' (string) can be automatically created/updated and sent to Solr for search.

Is that request a customized fieldtype (I mean a new java class)?
If yes, can anyone suggest a guideline or example about how to create one?
Or, can someone show me how to achive this (indexing certain parts of a cell value) with buildin functinos and classes?

 

I'm using Cloudera Manager Community Edition and CDH 5.4.7

Thanks!

1 ACCEPTED SOLUTION

Expert Contributor
3 REPLIES 3

Expert Contributor

You can plug a morphline into hbase-indexer to do some mini ETL on the fly during indexing from HBase into Solr. See the docs:

 

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_hbase_batch_...

 

and

 

http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/search_etl_morphlin...

New Contributor

Thank you very much!

I have one additional question. From the pages you provided, I realized that I'd better implement my own morphlines command to handle the JSON file (which requests wildchard supports during processing). However I'm wondering if there is any way to deploy this new command class to CDH without breaking the service of HBase?

Expert Contributor
Custom morphline commands are deployed by adding the jar with the custom code to the hbase-indexer Java classpath. The morphline runs inside the hbase-indexer processes which are separate from the hbase processes. It has no impact on the stability of the hbase service.