Reply
New Contributor
Posts: 2
Registered: ‎06-23-2016

How to index Binary Data (PDF) with Cloudera Search (Solr)

Hi together,

 

I try to upload pdfs into Solr. For this purpose I should use the "ExtractingRequestHandler" within the solrconfig.xml.

This is explained here: https://wiki.apache.org/solr/ExtractingRequestHandler

When I create the collection it comes to this error:

 

Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Thu, 23 Jun 2016 13:11:17 GMT

<?xml version="1.0" encoding="UTF-8"?>

<response>

<lst name="responseHeader">
<int name="status">
0</int>
<int name="QTime">
1639</int>
</lst>
<lst name="failure">
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'mob_shard1_replica1': Unable to create core [mob_shard1_replica1] Caused by: solr.ExtractingRequestHandler</str>
</lst>

</response>

 

So, how should I implement this class within Cloudera Distribution correctly?

 

Thanks in advance for any help!

Highlighted
Contributor
Posts: 56
Registered: ‎02-09-2015

Re: How to index Binary Data (PDF) with Cloudera Search (Solr)

have you tried solr tika ?, it will save you the pain creating a special handler