Community Articles

abajwa · ‎10-09-2015

Hbase indexing to Solr with HDP Search in HDP 2.3

Background:

The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.

References:
- https://doc.lucidworks.com/lucidworks-hdpsearch/2.3/Guide-Jobs.html#_hbase-indexer
- https://github.com/NGDATA/hbase-indexer/wiki/Tutorial

Steps

Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain

chown -R solr:solr /opt/lucidworks-hdpsearch/solr

If running on an Ambari installed HDP 2.3 cluster (instead of sandbox), run the below to install HDPsearch and setup the user dir in HDFS:

yum install -y lucidworks-hdpsearch
sudo -u hdfs hadoop fs -mkdir /user/solr
sudo -u hdfs hadoop fs -chown solr /user/solr

Point Solr to Zookeeper by configuring hbase-indexer-site.xml

vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml

<?xml version="1.0"?>
<configuration>
   <property>
      <name>hbaseindexer.zookeeper.connectstring</name>
      <value>sandbox.hortonworks.com:2181</value>
   </property>
  <property>
     <name>hbase.zookeeper.quorum</name>
     <value>sandbox.hortonworks.com</value>
   </property>
</configuration>

In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet:

hbase.replication=true
replication.source.ratio=1.0
replication.source.nb.capacity=1000
replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource

Copy Solrs Hbase related libs to $HBASE_HOME/lib

cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/

Restart Hbase
Copy hbase-site.xml to hbase-indexer's conf dir

cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/

Start Solr in cloud mode (pointing to ZK)

cd /opt/lucidworks-hdpsearch/solr
bin/solr start -c -z sandbox.hortonworks.com:2181

Create collection

  bin/solr create -c hbaseCollection \
     -d data_driven_schema_configs \
     -n myCollConfigs \
     -s 2 \
     -rf 2

Start Hbase indexer

cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/
./hbase-indexer server

In a second terminal, create table to be indexed in HBase. Open hbase shell and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.:

create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }
!quit

Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated.

vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml

<?xml version="1.0"?>
<indexer table="indexdemo-user">
  <field name="firstname_s" value="info:firstname"/>
  <field name="lastname_s" value="info:lastname"/>
  <field name="age_i" value="info:age" type="int"/>
</indexer>

The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.

Next, create an indexer based on the created indexer xml file.

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml  -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection

Check it got created

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers

Check that the index server output shows below

INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer

Log back in the hbase shell try adding some data to the indexdemo-user table

hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John'
hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'

Run commit

curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true

Open Solr UI and notice under statistics the "Num Docs" has increased: http://sandbox.hortonworks.com:8983/solr/#/hbaseCollection_shard1_replica1
Run query using Solr REST API: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...

Now try updating the data you've just added in hbase shell and commit

hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'

curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true

Check the content in Solr: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...
Note that the document's firstname_s field now contains the string "Jim".
Finally, delete the row from HBase and commit

hbase> deleteall 'indexdemo-user', 'row1'

curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true

Check the content in Solr and notice that the document has been removedhttp://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...
You have successfully setup Hbase indexing with HDP search

achittela · ‎10-28-2015

How do i do the exact text search, certainly the json output will display only matching data from hbase table.

achittela · ‎11-02-2015

Answer : http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...&fq=lastname_s : "Smith"

This will display the exact matched records.

achittela · ‎11-03-2015

Wildcard Search Types

Single character (matches a single character)?The search string Sm?th would match both Smith and text. Multiple characters (matches zero or more sequential characters)*The wildcard search: You can also use wildcard characters in the middle of a term or beginning of the term Smit* orSm*th or *mith

achittela · ‎11-03-2015

URL for Single Character Search

http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde... : Smit?

URL for Multiple Character Search

http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde... : Sm*

SteveSwartzl · ‎04-22-2016

In the add-indexer call, it is necessary to add the -z parameter to specify a Zookeeper location if Zookeeper is not running on the host where the indexer is running.

zangyongzhen · ‎07-18-2016

I installed lucidworks-hdpsearch via yum, but hbase-indexer can't start. I also use sandbox2.4, and hbase-indexer can't start too

ashok_padmanabh · ‎10-21-2016

Hi Ali - With HDP2.5 Ambari can be used for this installation right?

I am confused between :

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_solr-search-installation/content/ch_hdp-...

and

https://doc.lucidworks.com/lucidworks-hdpsearch/2.5/Guide-Install-Ambari.html

Do i have to do the first and then do the second ?

john_huckle · ‎02-09-2017

On HDP 2.4 I also needed to copy jackson-core-asl-1.9.13.jar and jackson-mapper-asl-1.9.13.jar from //opt/lucidworks-hdpsearch/contrib/clustering/lib to /opt/lucidworks-hdpsearch/hbase-indexer/lib and remove the 1.8.8 version of these jars before the indexer would start.

ram_manohar2708 · ‎07-28-2017

Hi, I am using HDP-2.6.0.3, Installed HBASE and SOLR using ambari. My hbase region servers are crashing the moment I add mentioned configurations in Ambari > HBase > Configs > Custom hbase-site I have proceeded without the configurations but it is not indexing the data. Please advise. Thanks.

Cloudera Community

Community Articles

Hbase indexing to Solr with HDP Search

Apache HBase

Apache Solr

Hbase indexing to Solr with HDP Search in HDP 2.3

Steps

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Re: Hbase indexing to Solr with HDP Search

Configure HDP Search Solr ranger plugin

Solr Indexing the database tables :

Indexing Oracle tables into Apache Solr

HDP Search 4.0 : Deployment and Basic Connector U...

unable to create Solr collection / index

HDP to CDP - Migration of Infra Solr collections

Re: Solr TTL - Auto-Purging Solr Documents & Range...

HBase HBCK2 tool for HDP 3.x

Searching in multiple collections in one query wit...

Creating a squid collection using Solr as Indexing...