Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

Hbase indexing to Solr with HDP Search in HDP 2.3

  • Background:

The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.

Steps

  • Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain
chown -R solr:solr /opt/lucidworks-hdpsearch/solr
  • If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch and setup the user dir in HDFS:
  • yum install -y lucidworks-hdpsearch
    sudo -u hdfs hadoop fs -mkdir /user/solr
    sudo -u hdfs hadoop fs -chown solr /user/solr
  • Point Solr to Zookeeper by configuring hbase-indexer-site.xml
  • vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml
    
    <?xml version="1.0"?>
    <configuration>
       <property>
          <name>hbaseindexer.zookeeper.connectstring</name>
          <value>sandbox.hortonworks.com:2181</value>
       </property>
      <property>
         <name>hbase.zookeeper.quorum</name>
         <value>sandbox.hortonworks.com</value>
       </property>
    </configuration>
    
    • In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet:
    hbase.replication=true
    replication.source.ratio=1.0
    replication.source.nb.capacity=1000
    replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource
    
    • Copy Solrs Hbase related libs to $HBASE_HOME/lib
    cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/
    
    • Restart Hbase
    • Copy hbase-site.xml to hbase-indexer's conf dir
    cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/
    
    • Start Solr in cloud mode (pointing to ZK)
    cd /opt/lucidworks-hdpsearch/solr
    bin/solr start -c -z sandbox.hortonworks.com:2181
    
    • Create collection
      bin/solr create -c hbaseCollection \
         -d data_driven_schema_configs \
         -n myCollConfigs \
         -s 2 \
         -rf 2 
    
    • Start Hbase indexer
    cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/
    ./hbase-indexer server
    
    • In a second terminal, create table to be indexed in HBase. Open hbase shell and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.:
    create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }
    !quit
    
    • Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated.
    vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml
    
    <?xml version="1.0"?>
    <indexer table="indexdemo-user">
      <field name="firstname_s" value="info:firstname"/>
      <field name="lastname_s" value="info:lastname"/>
      <field name="age_i" value="info:age" type="int"/>
    </indexer>
    
    • The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.
    • Next, create an indexer based on the created indexer xml file.
    /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml  -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection 
    
    • Check it got created
    /opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers
    
    • Check that the index server output shows below
    INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer
    
    • Log back in the hbase shell try adding some data to the indexdemo-user table
    hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John'
    hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'
    
    • Run commit
    curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
    

    Now try updating the data you've just added in hbase shell and commit

    hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'
    
    curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
    
    hbase> deleteall 'indexdemo-user', 'row1'
    
    curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
    
    11,881 Views
    Comments
    New Contributor

    How do i do the exact text search, certainly the json output will display only matching data from hbase table.

    New Contributor

    Answer : http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...&fq=lastname_s : "Smith"

    This will display the exact matched records.

    New Contributor

    Wildcard Search Types

    Single character (matches a single character)?The search string Sm?th would match both Smith and text. Multiple characters (matches zero or more sequential characters)*The wildcard search: You can also use wildcard characters in the middle of a term or beginning of the term Smit* orSm*th or *mith

    New Contributor

    In the add-indexer call, it is necessary to add the -z parameter to specify a Zookeeper location if Zookeeper is not running on the host where the indexer is running.

    New Contributor

    I installed lucidworks-hdpsearch via yum, but hbase-indexer can't start. I also use sandbox2.4, and hbase-indexer can't start too

    New Contributor

    Hi Ali - With HDP2.5 Ambari can be used for this installation right?

    I am confused between :

    https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_solr-search-installation/content/ch_hdp-...

    and

    https://doc.lucidworks.com/lucidworks-hdpsearch/2.5/Guide-Install-Ambari.html

    Do i have to do the first and then do the second ?

    Not applicable

    On HDP 2.4 I also needed to copy jackson-core-asl-1.9.13.jar and jackson-mapper-asl-1.9.13.jar from //opt/lucidworks-hdpsearch/contrib/clustering/lib to /opt/lucidworks-hdpsearch/hbase-indexer/lib and remove the 1.8.8 version of these jars before the indexer would start.

    New Contributor

    Hi, I am using HDP-2.6.0.3, Installed HBASE and SOLR using ambari. My hbase region servers are crashing the moment I add mentioned configurations in Ambari > HBase > Configs > Custom hbase-site I have proceeded without the configurations but it is not indexing the data. Please advise. Thanks.

    Don't have an account?
    Coming from Hortonworks? Activate your account here
    Version history
    Revision #:
    1 of 1
    Last update:
    ‎10-09-2015 01:37 AM
    Updated by:
     
    Contributors
    Top Kudoed Authors