Created on 10-14-2013 07:46 PM - edited 09-16-2022 01:48 AM
Hi
I tried setting up the HBASE indexer following the guidelines, but didn't see any indexed documents.
Using CM4.7 Search1.0.0
And the steps below:
Added the Indexer Service to the Cluster and the Morphlines File is:
SOLR_LOCATOR : { # Name of solr collection collection : hbase-collection1 # ZooKeeper ensemble zkHost : "$ZK_HOST" } morphlines : [ { id : morphline importCommands : ["com.cloudera.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "data:*" outputField : "data" type : string source : value } ] } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] } ]
and ks_indexer starts successfully
then enabled replication on HBase column families.
created a corresponding SolrCloud collection with following command line:
$ solrctl instancedir --generate $HOME/hbase-collection1 $ solrctl instancedir --create hbase-collection1 $HOME/hbase-collection1 $ solrctl collection --create hbase-collection1
created an HBase Indexer configuration:
$ cat $HOME/morphline-hbase-mapper.xml <?xml version="1.0"?> <indexer table="record" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper"> <!-- The relative or absolute path on the local file system to the morphline configuration file. --> <!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager <param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/> <!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf --> <!-- <param name="morphlineId" value="morphline1"/> --> </indexer>
created a Morphline Configuration File
$ cat /etc/hbase-solr/conf/morphlines.conf morphlines : [ { id : morphline1 importCommands : ["com.cloudera.cdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "data:*" outputField : "data" type : string source : value } #{ # inputColumn : "data:item" # outputField : "attachment_body" # type : "byte[]" # source : value #} ] } } #for avro use with type : "byte[]" in extractHBaseCells mapping above #{ readAvroContainer {} } #{ # extractAvroPaths { # paths : { # data : /user_name # } # } #} { logTrace { format : "output record: {}", args : ["@{}"] } } ] } ]
Registered an HBase Indexer configuration with the HBase Indexer Service
hbase-indexer add-indexer \ --name myIndexer \ --indexer-conf $HOME/morphline-hbase-mapper.xml \ --connection-param solr.zk=localhost:2181/solr \ --connection-param solr.collection=hbase-collection1 \ --zookeeper localhost:2181
#hbase-indexer list-indexers
myindex
+ Lifecycle state: ACTIVE
+ Incremental indexing state: SUBSCRIBE_AND_CONSUME
+ Batch indexing state: INACTIVE
+ SEP subscription ID: null
+ SEP subscription timestamp: 2013-10-14T19:00:36.262+08:00
+ Connection type: solr
+ Connection params:
+ solr.collection = hbase-collection1
+ solr.zk = localhost:2181/solr
+ Indexer config:
574 bytes, use -dump to see content
+ Batch index config:
(none)
+ Default batch index config:
(none)
+ Processes
+ 0 running processes
+ 0 failed processes
In solr query, I couldn't find the records which I had put it in hbase , what did I missed?
BTW, before trying search1.0.0, I had tried hbase-indexer from ngdata's github with indexdemo-usr example and succeeded.
Best Regards,
Created 10-16-2013 09:09 AM
Created 10-15-2013 12:51 AM
FWIW, there are some funny colon quote chars in the morphline config you posted. Probably just copy n'paste weirdness, but maybe something to double check.
Also enable TRACE logging and check the corresponding log files:
log4j.logger.com.cloudera.cdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE
Created 10-15-2013 11:27 AM
Hi @ooyama
Looks like your "HBase Indexer configuration" is wrong.
If you notice the XML comments in the morphline-hbase-mapper.xml, it has commented out all the text including the path to morphline file.
Fix:
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager -->
Created 10-16-2013 02:15 AM
Thank you very much!It's running now!
And one more question: How could I index the rows in hbase before hbase-indexer add-index? I know there's a command with lily
lily-update-index -n nameOfYourIndex --build-state BUILD_REQUESTED
What should I suppose to do with Cloudera Search?
Created 10-16-2013 09:09 AM