Created on 09-08-2017 05:29 AM - edited 08-17-2019 11:22 AM
Introduction
The objective of this article is to describe how use “DataImport” tool inside Apache Solr to index an Oracle DB table.
Assumptions & Design
In this exercise, I’ve used two virtual machines:
Please note that: Hortonworks HDP Sandbox comes with out-of-the-box Solr service that can be easily provisioned or enabled and used as well for this exercise through Ambari UI, instead of installing Solr service on a standalone node.
Oracle side
- Create a dummy table with the following structure:
[1]
- Insert some sample data into the created table.
On Solr Node
#yum install java-1.8.0-openjdk.x86_64
#java -version
#wget http://apache.org/dist/lucene/solr/6.6.0/solr-6.6.0.tgz
#tar xzf solr-6.6.0.tgz solr-6.6.0/bin/install_solr_service.sh --strip-components=2
#sudo bash ./install_solr_service.sh solr-6.6.0.tgz
#sudo service solr restart
Started Solr server on port 8983 (pid=[….]). Happy searching!
#scp ojdbc6.jar root@[Solr-IP-address]:/opt/solr/dist/
Create a new collection by invoking the “solr create –c” command from the path “/opt/solr/bin” as following:
[2]
From Solr portal (URL: http://[Solr-IP-Address]:8983/solr/#/), make sure that the new collection is appeared
[3]
on the right side, press “Add Field” button and make sure not to delete one of the main “Fields”.
[4][5]
[6][7]
after creating the schema fields, they should appear in the “Fields” list.
[8]
Create the “data-config.xml” file under “/var/solr/data/Oracle_table/conf/”. make sure of the column/field mapping between the Oracle DB table and Solr’s Schema fields are properly configured properly.
<dataConfig>
<dataSource name="jdbc" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@//[DB-IP-Address]:[DB-Port]/[DBInstanceName]" user="myDBuser" password="myDBpass"/>
<entity name="solr_test" query="select * from solr_test">
<field column="EMP_ID" name="id" />
<field column="FIRST_NAME" name="first_name" />
<field column="LAST_NAME" name="last_name" />
<field column="DOB" name="dob" />
</entity>
</document>
</dataConfig>
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
<lib dir="/opt/solr-6.6.0/dist/" regex=".*\.jar" />
From the Solr web UI, make sure that the “DataImport” under the created collection “Oracle_table” is as following without errors or warnings:
[9]
press “Execute” button, and wait for a while or press “Refresh Status” button till a green notification panel is appeared, such as following:
[10]
Results
Solr Side
from the left panel in Solr, select “Query”, and make sure that you’ll get results (on the right side) after pressing on “Execute Query” button, as following:
[11]
Future Work
The future work will be extending Solr standalone node to be within a small cluster for maintaining the cores’ replication and high availability.
References
http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html
https://cwiki.apache.org/confluence/display/solr/Running+Solr
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |