Support Questions
Find answers, ask questions, and share your expertise

How to index Binary Data (PDF) with Cloudera Search (Solr)

How to index Binary Data (PDF) with Cloudera Search (Solr)

New Contributor

Hi together,


I try to upload pdfs into Solr. For this purpose I should use the "ExtractingRequestHandler" within the solrconfig.xml.

This is explained here:

When I create the collection it comes to this error:


Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Thu, 23 Jun 2016 13:11:17 GMT

<?xml version="1.0" encoding="UTF-8"?>


<lst name="responseHeader">
<int name="status">
<int name="QTime">
<lst name="failure">
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'mob_shard1_replica1': Unable to create core [mob_shard1_replica1] Caused by: solr.ExtractingRequestHandler</str>



So, how should I implement this class within Cloudera Distribution correctly?


Thanks in advance for any help!


Re: How to index Binary Data (PDF) with Cloudera Search (Solr)

Expert Contributor
have you tried solr tika ?, it will save you the pain creating a special handler