Community Articles

msabri · ‎09-08-2017

Introduction

The objective of this article is to describe how use “DataImport” tool inside Apache Solr to index an Oracle DB table.

Assumptions & Design

In this exercise, I’ve used two virtual machines:

1-Oracle Database App Development VM – 2GB RAM

2-Linux Centos VM – 3GB RAM (Solr node)

Please note that: Hortonworks HDP Sandbox comes with out-of-the-box Solr service that can be easily provisioned or enabled and used as well for this exercise through Ambari UI, instead of installing Solr service on a standalone node.

Oracle side

- Create a dummy table with the following structure:

[1]

- Insert some sample data into the created table.

On Solr Node

#yum install java-1.8.0-openjdk.x86_64

#java -version

#wget http://apache.org/dist/lucene/solr/6.6.0/solr-6.6.0.tgz

#tar xzf solr-6.6.0.tgz solr-6.6.0/bin/install_solr_service.sh --strip-components=2

#sudo bash ./install_solr_service.sh solr-6.6.0.tgz

#sudo service solr restart

you should see something like following

Started Solr server on port 8983 (pid=[….]). Happy searching!

from Oracle machine, copy the ojdbc to solr server

#scp ojdbc6.jar root@[Solr-IP-address]:/opt/solr/dist/

Create a new collection by invoking the “solr create –c” command from the path “/opt/solr/bin” as following:

[2]

From Solr portal (URL: http://[Solr-IP-Address]:8983/solr/#/), make sure that the new collection is appeared

[3]

from the left panel of Solr home page, and after selecting the “Oracle_table” core, select “Schema”, add the schema for the new table created in Oracle DB.

on the right side, press “Add Field” button and make sure not to delete one of the main “Fields”.

[4][5]

[6][7]

after creating the schema fields, they should appear in the “Fields” list.

[8]

Create the “data-config.xml” file under “/var/solr/data/Oracle_table/conf/”. make sure of the column/field mapping between the Oracle DB table and Solr’s Schema fields are properly configured properly.

<dataConfig>

<dataSource name="jdbc" driver="oracle.jdbc.OracleDriver" url="jdbc:oracle:thin:@//[DB-IP-Address]:[DB-Port]/[DBInstanceName]" user="myDBuser" password="myDBpass"/>

<entity name="solr_test" query="select * from solr_test">

<field column="EMP_ID" name="id" />

<field column="FIRST_NAME" name="first_name" />

<field column="LAST_NAME" name="last_name" />

<field column="DOB" name="dob" />

</entity>

</document>

</dataConfig>

Add the following DataImport handler in “solrconfig.xml” file.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">

<lst name="defaults">

<str name="config">data-config.xml</str>

</lst>

</requestHandler>

Add the following <lib/> element in solrconfig.xml

<lib dir="/opt/solr-6.6.0/dist/" regex=".*\.jar" />

From the Solr web UI, make sure that the “DataImport” under the created collection “Oracle_table” is as following without errors or warnings:

[9]

press “Execute” button, and wait for a while or press “Refresh Status” button till a green notification panel is appeared, such as following:

[10]

Results

Solr Side

from the left panel in Solr, select “Query”, and make sure that you’ll get results (on the right side) after pressing on “Execute Query” button, as following:

[11]

Future Work

The future work will be extending Solr standalone node to be within a small cluster for maintaining the cores’ replication and high availability.

References

http://www.oracle.com/technetwork/database/enterprise-edition/databaseappdev-vm-161299.html

https://cwiki.apache.org/confluence/display/solr/Running+Solr

Cloudera Community

Community Articles

Indexing Oracle tables into Apache Solr

Apache Solr

Connecting Solr to Spark - Apache Zeppelin Noteboo...

Unable to delete solr indexes

Creating a squid collection using Solr as Indexing...

Solr indexing

How to ingest data from Oracle into Azure ADLS Gen...

Phoenix Index Lifecycle

Converting CSV Files to Apache Hive Tables with Ap...

Sqoop from Oracle to Hive table

Solr Best Practices

Replace Null with white space for not-null column ...