Created on 11-23-2015 10:16 AM - edited 09-16-2022 02:50 AM
Hi,
I'm wondering if we can configure the Lily HBase indexer to index certain part of a cell value following a customized rule?
We have a table which contains only one column. And that table is used to store a well-defined JSON file (Byte[]). Now we are tryging to use 2nd indexing tools (We are considering Lily HBase indexer) to index the content of certain fields within that JSON. For example there is a field 'data.text' which contains freetext of a artical, and another field 'data.feedback' contains freetext of feedback. We want to configure the Lily HBase Indexer so that when a new record (JSON file) being added to that table, index of 'data.text' (string) and 'data.feedback' (string) can be automatically created/updated and sent to Solr for search.
Is that request a customized fieldtype (I mean a new java class)?
If yes, can anyone suggest a guideline or example about how to create one?
Or, can someone show me how to achive this (indexing certain parts of a cell value) with buildin functinos and classes?
I'm using Cloudera Manager Community Edition and CDH 5.4.7
Thanks!
Created 11-23-2015 12:11 PM
You can plug a morphline into hbase-indexer to do some mini ETL on the fly during indexing from HBase into Solr. See the docs:
and
Created 11-23-2015 12:11 PM
You can plug a morphline into hbase-indexer to do some mini ETL on the fly during indexing from HBase into Solr. See the docs:
and
Created 11-23-2015 10:52 PM
Thank you very much!
I have one additional question. From the pages you provided, I realized that I'd better implement my own morphlines command to handle the JSON file (which requests wildchard supports during processing). However I'm wondering if there is any way to deploy this new command class to CDH without breaking the service of HBase?
Created 11-23-2015 11:00 PM