Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Tried Search 1.0.0 and Morphline HBase Indexer with no success

avatar
Explorer

Hi

    I tried setting up the HBASE indexer following the guidelines, but didn't see any indexed documents.

    Using CM4.7 Search1.0.0

    And the steps below:

 

Added the Indexer Service to the Cluster and the Morphlines File is:

 

SOLR_LOCATOR : {
  # Name of solr collection
  collection : hbase-collection1
  
  # ZooKeeper ensemble
  zkHost : "$ZK_HOST" 
}


morphlines : [
{
id : morphline
importCommands : ["com.cloudera.**", "com.ngdata.**"]

commands : [                    
  {
    extractHBaseCells {
      mappings : [
        {
          inputColumn : "data:*"
          outputField : "data" 
          type : string 
          source : value
        }
      ]
    }
  }


  { logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]

 

and ks_indexer starts successfully

 

then enabled replication on HBase column families.

 

created a corresponding SolrCloud collection with following command line:

$ solrctl instancedir --generate $HOME/hbase-collection1
$ solrctl instancedir --create hbase-collection1 $HOME/hbase-collection1
$ solrctl collection --create hbase-collection1

 

created an HBase Indexer configuration:

$ cat $HOME/morphline-hbase-mapper.xml

<?xml version="1.0"?>
<indexer table="record" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper">

   <!-- The relative or absolute path on the local file system to the morphline configuration file. -->
   <!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager 
   <param name="morphlineFile" value="/etc/hbase-solr/conf/morphlines.conf"/>

   <!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf -->
   <!-- <param name="morphlineId" value="morphline1"/> -->

</indexer>

 

created a Morphline Configuration File

 

$ cat /etc/hbase-solr/conf/morphlines.conf

morphlines : [
  {
    id : morphline1
    importCommands : ["com.cloudera.cdk.morphline.**", "com.ngdata.**"]

    commands : [                    
      {
        extractHBaseCells {
          mappings : [
            {
              inputColumn : "data&colon;*"
              outputField : "data" 
              type : string 
              source : value
            }

            #{
            #  inputColumn : "data&colon;item"
            #  outputField : "attachment_body" 
            #  type : "byte[]" 
            #  source : value
            #}
          ]
        }
      }

      #for avro use with type : "byte[]" in extractHBaseCells mapping above
      #{ readAvroContainer {} } 
      #{ 
      #  extractAvroPaths {
      #    paths : { 
      #      data &colon; /user_name      
      #    }
      #  }
      #}

      { logTrace { format : "output record: {}", args : ["@{}"] } }    
    ]
  }
]

 

Registered an HBase Indexer configuration with the HBase Indexer Service

 

hbase-indexer add-indexer \
--name myIndexer \
--indexer-conf $HOME/morphline-hbase-mapper.xml \
--connection-param solr.zk=localhost:2181/solr \
--connection-param solr.collection=hbase-collection1 \
--zookeeper localhost:2181

 

#hbase-indexer list-indexers

 

myindex
  + Lifecycle state: ACTIVE
  + Incremental indexing state: SUBSCRIBE_AND_CONSUME
  + Batch indexing state: INACTIVE
  + SEP subscription ID: null
  + SEP subscription timestamp: 2013-10-14T19:00:36.262+08:00
  + Connection type: solr
  + Connection params:
    + solr.collection = hbase-collection1
    + solr.zk = localhost:2181/solr
  + Indexer config:
      574 bytes, use -dump to see content
  + Batch index config:
      (none)
  + Default batch index config:
      (none)
  + Processes
    + 0 running processes
    + 0 failed processes

 

 

In solr query, I couldn't find the records which I had put it in hbase , what did I missed?

BTW, before trying search1.0.0, I had tried hbase-indexer from ngdata's github with indexdemo-usr example and succeeded.

 

Best Regards,

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator
We are working on such a feature, should be available soon. Meanwhile you can work around it by touching all cells without significantly modifying them, e.g. by updating the timestamp.

Wolfgang.

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

FWIW, there are some funny colon quote chars in the morphline config you posted. Probably just copy n'paste weirdness, but maybe something to double check.

 

Also enable TRACE logging and check the corresponding log files:

 

log4j.logger.com.cloudera.cdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE

 

avatar
Cloudera Employee

Hi @ooyama

 

   Looks like your "HBase Indexer configuration" is wrong.

 

   If you notice the XML comments in the morphline-hbase-mapper.xml, it has commented out all the text including the path to morphline file.

 

   Fix:

        

<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager
<!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager -->

avatar
Explorer

Thank you very much!It's running now!

 

And one more question: How could I index the rows in hbase before hbase-indexer add-index? I know there's a command with lily

lily-update-index -n nameOfYourIndex --build-state BUILD_REQUESTED

 What should I suppose to do with Cloudera Search?

avatar
Super Collaborator
We are working on such a feature, should be available soon. Meanwhile you can work around it by touching all cells without significantly modifying them, e.g. by updating the timestamp.

Wolfgang.