Reply
New Contributor
Posts: 5
Registered: ‎08-20-2013
Accepted Solution

Tried Search 1.0.0 and Morphline HBase Indexer with no success

[ Edited ]

Hi

    I tried setting up the HBASE indexer following the guidelines, but didn't see any indexed documents.

    Using CM4.7 Search1.0.0

    And the steps below:

 

Added the Indexer Service to the Cluster and the Morphlines File is:

 

SOLR_LOCATOR : {
  # Name of solr collection
  collection : hbase-collection1
  
  # ZooKeeper ensemble
  zkHost : "$ZK_HOST" 
}


morphlines : [
{
id : morphline
importCommands : ["com.cloudera.**", "com.ngdata.**"]

commands : [                    
  {
    extractHBaseCells {
      mappings : [
        {
          inputColumn : "data:*"
          outputField : "data" 
          type : string 
          source : value
        }
      ]
    }
  }


  { logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

and ks_indexer starts successfully

 

then enabled replication on HBase column families.

 

created a corresponding SolrCloud collection with following command line:

$ solrctl instancedir --generate $HOME/hbase-collection1
$ solrctl instancedir --create hbase-collection1 $HOME/hbase-collection1
$ solrctl collection --create hbase-collection1

 

created an HBase Indexer configuration:

$ cat $HOME/morphline-hbase-mapper.xml

<?xml version="1.0"?>
<indexer table="record" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

   <!-- The relative or absolute path on the local file system to the morphline configuration file. -->
   <!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager 
   <param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/>

   <!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf -->
   <!-- <param name="morphlineId" value="morphline1"/> -->

</indexer>

 

created a Morphline Configuration File

 

$ cat /etc/hbase-solr/conf/morphlines.conf

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.cdk.morphline.**", "com.ngdata.**"]

    commands : [                    
      {
        extractHBaseCells {
          mappings : [
            {
              inputColumn : "data&colon;*"
              outputField : "data" 
              type : string 
              source : value
            }

            #{
            #  inputColumn : "data&colon;item"
            #  outputField : "attachment_body" 
            #  type : "byte[]" 
            #  source : value
            #}
          ]
        }
      }

      #for avro use with type : "byte[]" in extractHBaseCells mapping above
      #{ readAvroContainer {} } 
      #{ 
      #  extractAvroPaths {
      #    paths : { 
      #      data &colon; /user_name      
      #    }
      #  }
      #}

      { logTrace { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

 

Registered an HBase Indexer configuration with the HBase Indexer Service

 

hbase-indexer add-indexer \
--name myIndexer \
--indexer-conf $HOME/morphline-hbase-mapper.xml \
--connection-param solr.zk=localhost:2181/solr \
--connection-param solr.collection=hbase-collection1 \
--zookeeper localhost:2181

 

#hbase-indexer list-indexers

 

myindex
  + Lifecycle state: ACTIVE
  + Incremental indexing state: SUBSCRIBE_AND_CONSUME
  + Batch indexing state: INACTIVE
  + SEP subscription ID: null
  + SEP subscription timestamp: 2013-10-14T19:00:36.262+08:00
  + Connection type: solr
  + Connection params:
    + solr.collection = hbase-collection1
    + solr.zk = localhost:2181/solr
  + Indexer config:
      574 bytes, use -dump to see content
  + Batch index config:
      (none)
  + Default batch index config:
      (none)
  + Processes
    + 0 running processes
    + 0 failed processes

 

 

In solr query, I couldn't find the records which I had put it in hbase , what did I missed?

BTW, before trying search1.0.0, I had tried hbase-indexer from ngdata's github with indexdemo-usr example and succeeded.

 

Best Regards,

 

Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Tried Search 1.0.0 and Morphline HBase Indexer with no success

FWIW, there are some funny colon quote chars in the morphline config you posted. Probably just copy n'paste weirdness, but maybe something to double check.

 

Also enable TRACE logging and check the corresponding log files:

 

log4j.logger.com.cloudera.cdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE

 

Cloudera Employee
Posts: 5
Registered: ‎09-24-2013

Re: Tried Search 1.0.0 and Morphline HBase Indexer with no success

Hi @ooyama

 

   Looks like your "HBase Indexer configuration" is wrong.

 

   If you notice the XML comments in the morphline-hbase-mapper.xml, it has commented out all the text including the path to morphline file.

 

   Fix:

        

<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager -->
New Contributor
Posts: 5
Registered: ‎08-20-2013

Re: Tried Search 1.0.0 and Morphline HBase Indexer with no success

Thank you very much!It's running now!

 

And one more question: How could I index the rows in hbase before hbase-indexer add-index? I know there's a command with lily

lily-update-index -n nameOfYourIndex --build-state BUILD_REQUESTED

 What should I suppose to do with Cloudera Search?

Highlighted
Cloudera Employee
Posts: 146
Registered: ‎08-21-2013

Re: Tried Search 1.0.0 and Morphline HBase Indexer with no success

We are working on such a feature, should be available soon. Meanwhile you can work around it by touching all cells without significantly modifying them, e.g. by updating the timestamp.

Wolfgang.

Announcements