New Contributor
Posts: 2
Registered: ‎06-23-2016

How to index Binary Data (PDF) with Cloudera Search (Solr)

Hi together,


I try to upload pdfs into Solr. For this purpose I should use the "ExtractingRequestHandler" within the solrconfig.xml.

This is explained here:

When I create the collection it comes to this error:


Error: A call to SolrCloud WEB APIs failed: HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: application/xml;charset=UTF-8
Transfer-Encoding: chunked
Date: Thu, 23 Jun 2016 13:11:17 GMT

<?xml version="1.0" encoding="UTF-8"?>


<lst name="responseHeader">
<int name="status">
<int name="QTime">
<lst name="failure">
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'mob_shard1_replica1': Unable to create core [mob_shard1_replica1] Caused by: solr.ExtractingRequestHandler</str>



So, how should I implement this class within Cloudera Distribution correctly?


Thanks in advance for any help!

Posts: 56
Registered: ‎02-09-2015

Re: How to index Binary Data (PDF) with Cloudera Search (Solr)

have you tried solr tika ?, it will save you the pain creating a special handler