Reply
Highlighted
New Contributor
Posts: 6
Registered: ‎05-05-2015

SolrCloud - problem importing data from MySQL database

I am trying to import some data which is currently sitting in a MySQL database to Cloudera Search (SolrCloud) but just cant get it to work.

 

Here's  my setup:

 

4 node cluster CDH 5.4.4

Solr installed and running fine - checked with the enron-email sample data

 

Step 1.

Downloaded solrconfig files using the command:

solrctl instancedir --generate $HOME/solr_configs2

 

Step 2.

Modified solrconfig.xml to include the following lines:

<lib dir="../../../contrib/dataimporthandler/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-\d.*\.jar" />

 


<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>

 

Step 3:

Created a file data-config.xml with the following contents:

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://<server_ip>/<dbname>"
batchSize="-1"
zeroDateTimeBehavior="convertToNull"
user="<db_user_name>"
password="<db_password>"/>

<document name="article">
<entity name="id"
query="SELECT ....   SQL quer here ...." >

 

<field column="id" name="id" />
<field column="article_id" name="article_id" />
....

 


</entity>
</document>
</dataConfig>

 

 

Step 4

Created schema.xml to conform to the schema being returned by the MySQL query

 

 

Step 5

Uploaded config to SolrCloud using the command:

solrctl instancedir --create impactprint $HOME/solr_configs2

 

Step 6 

Tried creating the collection using the command :

solrctl collection --create name_of_collection -s 2 -c name_of_collection

 

 

The above generates the following error :

Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Mon, 11 Jan 2016 10:57:04 GMT

<?xml version="1.0" encoding="UTF-8"?>

<response>

<lst name="responseHeader">
<int name="status">
0</int>
<int name="QTime">
3565</int>
</lst>
<lst name="failure">
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'impactprint_shard1_replica1': Unable to create core [impactprint_shard1_replica1] Caused by: org.apache.solr.handler.dataimport.DataImportHandler</str>
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'impactprint_shard2_replica1': Unable to create core [impactprint_shard2_replica1] Caused by: org.apache.solr.handler.dataimport.DataImportHandler</str>
</lst>

</response>

 

 

Please help!

 

Cloudera Employee
Posts: 30
Registered: ‎09-17-2013

Re: SolrCloud - problem importing data from MySQL database

Can you look in your server logs for the full error message?

 

In any case, I'd guess it's due to missing the DataImportHandler because the packages/parcels don't include it because we don't support it.  It has some serious limitations such as not functioning in secure environments and not scaling with the size of the cluster (scales with number of Solr nodes, not e.g. number of MR nodes as with the MRIT).

 

I'm out of my depth on the best way to solve this using Cloudera Search -- maybe use Sqoop to dump to Hadoop then use the MapReduceIndexerTool?

New Contributor
Posts: 6
Registered: ‎05-05-2015

Re: SolrCloud - problem importing data from MySQL database

Thanks for your reply.

 

Will MapReduceIndexer Tool handle CSV files?

 

Can you point me to a quick tutorial / example for MapReduceIndexer Tool ?

 

Many Thanks !