- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 10-09-2015 01:37 AM
Hbase indexing to Solr with HDP Search in HDP 2.3
- Background:
The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.
- References:
Steps
- Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch) and run below to ensure no log files owned by root remain
chown -R solr:solr /opt/lucidworks-hdpsearch/solr
yum install -y lucidworks-hdpsearch sudo -u hdfs hadoop fs -mkdir /user/solr sudo -u hdfs hadoop fs -chown solr /user/solr
vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml <?xml version="1.0"?> <configuration> <property> <name>hbaseindexer.zookeeper.connectstring</name> <value>sandbox.hortonworks.com:2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>sandbox.hortonworks.com</value> </property> </configuration>
- In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet:
hbase.replication=true replication.source.ratio=1.0 replication.source.nb.capacity=1000 replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource
- Copy Solrs Hbase related libs to $HBASE_HOME/lib
cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/
- Restart Hbase
- Copy hbase-site.xml to hbase-indexer's conf dir
cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/
- Start Solr in cloud mode (pointing to ZK)
cd /opt/lucidworks-hdpsearch/solr bin/solr start -c -z sandbox.hortonworks.com:2181
- Create collection
bin/solr create -c hbaseCollection \ -d data_driven_schema_configs \ -n myCollConfigs \ -s 2 \ -rf 2
- Start Hbase indexer
cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/ ./hbase-indexer server
- In a second terminal, create table to be indexed in HBase. Open
hbase shell
and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.:
create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' } !quit
- Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated.
vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml <?xml version="1.0"?> <indexer table="indexdemo-user"> <field name="firstname_s" value="info:firstname"/> <field name="lastname_s" value="info:lastname"/> <field name="age_i" value="info:age" type="int"/> </indexer>
- The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.
- Next, create an indexer based on the created indexer xml file.
/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection
- Check it got created
/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers
- Check that the index server output shows below
INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer
- Log back in the
hbase shell
try adding some data to the indexdemo-user table
hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John' hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'
- Run commit
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
- Open Solr UI and notice under statistics the "Num Docs" has increased: http://sandbox.hortonworks.com:8983/solr/#/hbaseCollection_shard1_replica1
- Run query using Solr REST API: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...
Now try updating the data you've just added in hbase shell
and commit
hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
- Check the content in Solr: http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...
- Note that the document's firstname_s field now contains the string "Jim".
- Finally, delete the row from HBase and commit
hbase> deleteall 'indexdemo-user', 'row1'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
- Check the content in Solr and notice that the document has been removedhttp://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...
- You have successfully setup Hbase indexing with HDP search
Created on 10-28-2015 06:36 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
How do i do the exact text search, certainly the json output will display only matching data from hbase table.
Created on 11-02-2015 03:38 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Answer : http://sandbox.hortonworks.com:8983/solr/hbaseCollection_shard1_replica1/select?q=*%3A*&wt=json&inde...&fq=lastname_s : "Smith"
This will display the exact matched records.
Created on 11-03-2015 04:03 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Wildcard Search Types
Single character (matches a single character)?The search string Sm?th
would match both Smith and text.
Multiple characters (matches zero or more sequential characters)*The wildcard search:
You can also use wildcard characters in the middle of a term or beginning of the term
Smit* or
Sm*th
or *mith
Created on 11-03-2015 04:07 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
URL for Single Character Search
URL for Multiple Character Search
Created on 04-22-2016 04:40 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
In the add-indexer call, it is necessary to add the -z parameter to specify a Zookeeper location if Zookeeper is not running on the host where the indexer is running.
Created on 07-18-2016 07:29 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I installed lucidworks-hdpsearch via yum, but hbase-indexer can't start. I also use sandbox2.4, and hbase-indexer can't start too
Created on 10-21-2016 03:34 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi Ali - With HDP2.5 Ambari can be used for this installation right?
I am confused between :
and
https://doc.lucidworks.com/lucidworks-hdpsearch/2.5/Guide-Install-Ambari.html
Do i have to do the first and then do the second ?
Created on 02-09-2017 03:40 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
On HDP 2.4 I also needed to copy jackson-core-asl-1.9.13.jar and jackson-mapper-asl-1.9.13.jar from //opt/lucidworks-hdpsearch/contrib/clustering/lib to /opt/lucidworks-hdpsearch/hbase-indexer/lib and remove the 1.8.8 version of these jars before the indexer would start.
Created on 07-28-2017 01:11 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi, I am using HDP-2.6.0.3, Installed HBASE and SOLR using ambari. My hbase region servers are crashing the moment I add mentioned configurations in Ambari > HBase > Configs > Custom hbase-site I have proceeded without the configurations but it is not indexing the data. Please advise. Thanks.