Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Contributor

Introduction

The objective of this article is to describe how use “DataImport” tool inside Apache Solr to index an Oracle DB table.

Assumptions & Design

In this exercise, I’ve used two virtual machines:

  • 2-Linux Centos VM – 3GB RAM (Solr node)

Please note that: Hortonworks HDP Sandbox comes with out-of-the-box Solr service that can be easily provisioned or enabled and used as well for this exercise through Ambari UI, instead of installing Solr service on a standalone node.

Oracle side

- Create a dummy table with the following structure:

[1]

38551-1.png

- Insert some sample data into the created table.

On Solr Node

#yum install java-1.8.0-openjdk.x86_64

#java -version

#wget http://apache.org/dist/lucene/solr/6.6.0/solr-6.6.0.tgz

#tar xzf solr-6.6.0.tgz solr-6.6.0/bin/install_solr_service.sh --strip-components=2

#sudo bash ./install_solr_service.sh solr-6.6.0.tgz

#sudo service solr restart
  • you should see something like following

Started Solr server on port 8983 (pid=[….]). Happy searching!

  • from Oracle machine, copy the ojdbc to solr server

#scp ojdbc6.jar root@[Solr-IP-address]:/opt/solr/dist/

Create a new collection by invoking the “solr create –c” command from the path “/opt/solr/bin” as following:

[2]

38552-2.png

From Solr portal (URL: http://[Solr-IP-Address]:8983/solr/#/), make sure that the new collection is appeared

[3]

38553-3.png

  • from the left panel of Solr home page, and after selecting the “Oracle_table” core, select “Schema”, add the schema for the new table created in Oracle DB.

on the right side, press “Add Field” button and make sure not to delete one of the main “Fields”.

[4][5]

[6][7]

38554-4.png

38555-5.png

38556-6.png

38557-7.png

after creating the schema fields, they should appear in the “Fields” list.

[8]

38558-8.png

Create the “data-config.xml” file under “/var/solr/data/Oracle_table/conf/”. make sure of the column/field mapping between the Oracle DB table and Solr’s Schema fields are properly configured properly.

<dataConfig>

<dataSource name="jdbc" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@//[DB-IP-Address]:[DB-Port]/[DBInstanceName]" user="myDBuser" password="myDBpass"/>

<entity name="solr_test" query="select * from solr_test">

<field column="EMP_ID" name="id" />

<field column="FIRST_NAME" name="first_name" />

<field column="LAST_NAME" name="last_name" />

<field column="DOB" name="dob" />

</entity>

</document>

</dataConfig>

  • Add the following DataImport handler in “solrconfig.xml” file.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">

<lst name="defaults">

<str name="config">data-config.xml</str>

</lst>

</requestHandler>

  • Add the following <lib/> element in solrconfig.xml

<lib dir="/opt/solr-6.6.0/dist/" regex=".*\.jar" />

From the Solr web UI, make sure that the “DataImport” under the created collection “Oracle_table” is as following without errors or warnings:

[9]

38559-9.png

press “Execute” button, and wait for a while or press “Refresh Status” button till a green notification panel is appeared, such as following:

[10]

38560-10.png

Results

Solr Side

from the left panel in Solr, select “Query”, and make sure that you’ll get results (on the right side) after pressing on “Execute Query” button, as following:

[11]

38561-11.png

Future Work

The future work will be extending Solr standalone node to be within a small cluster for maintaining the cores’ replication and high availability.

References

http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html

https://cwiki.apache.org/confluence/display/solr/Running+Solr

5,549 Views
0 Kudos
Version history
Last update:
‎08-17-2019 11:22 AM
Updated by:
Contributors