Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to index Binary Data (PDF) with Cloudera Search (Solr)

How to index Binary Data (PDF) with Cloudera Search (Solr)

New Contributor

Hi together,

 

I try to upload pdfs into Solr. For this purpose I should use the "ExtractingRequestHandler" within the solrconfig.xml.

This is explained here: https://wiki.apache.org/solr/ExtractingRequestHandler

When I create the collection it comes to this error:

 

Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Thu, 23 Jun 2016 13:11:17 GMT

<?xml version="1.0" encoding="UTF-8"?>

<response>

<lst name="responseHeader">
<int name="status">
0</int>
<int name="QTime">
1639</int>
</lst>
<lst name="failure">
<str>
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'mob_shard1_replica1': Unable to create core [mob_shard1_replica1] Caused by: solr.ExtractingRequestHandler</str>
</lst>

</response>

 

So, how should I implement this class within Cloudera Distribution correctly?

 

Thanks in advance for any help!

1 REPLY 1
Highlighted

Re: How to index Binary Data (PDF) with Cloudera Search (Solr)

Expert Contributor
have you tried solr tika ?, it will save you the pain creating a special handler