Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudera Search , index pdf documents

Cloudera Search , index pdf documents

New Contributor

Hi

I am running Cloudera CDH 6.2.1 on a proof of concept cluster (1 node), 

i am trying to index a pdf document with Cloudera search usiing the mapreduceIndexerTool, i created the schema and morphline conf file

here is my mprphline file

SOLR_LOCATOR : {
# Name of solr collection
collection : collection1

# ZooKeeper ensemble
zkHost : "127.0.0.1:2181/solr"

# The maximum number of documents to send to Solr per network batch (throughput knob)
# batchSize : 100
}

morphlines : [

{

id : morphlinepdfs

importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

commands : [

{ detectMimeType { includeDefaultMimeTypes : true } }

{

solrCell {

solrLocator : ${solrLocator}

captureAttr : true

lowernames : true

capture : [id, title, author, content, content_type, subject, description, keywords, category, resourcename, url, last_modified, links]

parsers : [ { parser : org.apache.tika.parser.pdf.PDFParser } ]

}

}

{ generateUUID { field : id } }

{ sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }

{ loadSolr: { solrLocator : ${solrLocator} } }

]

}

]

 

when running the job, i am getting the following error;

1526 [main] INFO org.apache.solr.hadoop.MapReduceIndexerTool - Indexing 1 files using 1 real mappers into 1 reducers
Error: org.kitesdk.morphline.api.MorphlineCompilationException: No command builder registered for name: solrCell near: {
# morphlines_parse_pdf.conf: 52
"solrCell" : {
# morphlines_parse_pdf.conf: 62
"solrContentHandlerFactory" : "org.kitesdk.morphline.solrcell.TrimSolrContentHandlerFactory",
# morphlines_parse_pdf.conf: 65
"parsers" : [
# morphlines_parse_pdf.conf: 65
{
# morphlines_parse_pdf.conf: 65
"parser" : "org.apache.tika.parser.pdf.PDFParser"
}
],

 

anyone can help

Thanks

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here