Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchable)

Solved Go to solution

Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchable)

New Contributor

Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchable):

 

1)      I am Using MapReduceIndexerTool to index various types of files (doc, pdf, xls etc)

${SOLR_HOME}/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool

 

2)      I am using morphline solrCell command chaining

fmap : { content : text, content-type : content_type }

 

3)      In schema.xml I have below entry

<field name="resourcename" type="text_general" indexed="true" stored="true"/>

<copyField source="resourcename" dest="text"/>

 

4)      If needed, I will provide the solrconfig.xml details

 

5)      I have below code

              IndexReader reader = DirectoryReader.open(rdir);

              Document doc = reader.document(0);

       System.out.println("Fields: " + doc.getFields());

 

What I observed is : the filename/resourcename has not been indexed.

I only see these fields:

content_type:application/pdf:

id:c385c455-5c1a-4284-937b-e88003fa3438#0:

author:Neogi, Anindya:

author_s:Neogi, Anindya:

last_modified:1461302622000:

_version_:1533216476522086400:

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchab

New Contributor

To get full hdfs filepath and filename into index, just add the following to schema.xml, then create collection and then index using mapR. Nothing needs to be specified in morphline config file.

   <field name="file_path" type="string" indexed="true" stored="true" />

   <field name="file_name" type="string" indexed="true" stored="true" />

1 REPLY 1
Highlighted

Re: Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchab

New Contributor

To get full hdfs filepath and filename into index, just add the following to schema.xml, then create collection and then index using mapR. Nothing needs to be specified in morphline config file.

   <field name="file_path" type="string" indexed="true" stored="true" />

   <field name="file_name" type="string" indexed="true" stored="true" />

Don't have an account?
Coming from Hortonworks? Activate your account here