Member since
05-02-2016
4
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2114 | 05-04-2016 10:59 PM |
05-04-2016
10:59 PM
To get full hdfs filepath and filename into index, just add the following to schema.xml, then create collection and then index using mapR. Nothing needs to be specified in morphline config file. <field name="file_path" type="string" indexed="true" stored="true" /> <field name="file_name" type="string" indexed="true" stored="true" />
... View more
05-02-2016
09:25 AM
Not able to index filename using Cloudera Solr MarR (file content is getting stored and searchable): 1) I am Using MapReduceIndexerTool to index various types of files (doc, pdf, xls etc) ${SOLR_HOME}/contrib/mr/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool 2) I am using morphline solrCell command chaining fmap : { content : text, content-type : content_type } 3) In schema.xml I have below entry <field name="resourcename" type="text_general" indexed="true" stored="true"/> <copyField source="resourcename" dest="text"/> 4) If needed, I will provide the solrconfig.xml details 5) I have below code IndexReader reader = DirectoryReader.open(rdir); Document doc = reader.document(0); System.out.println("Fields: " + doc.getFields()); What I observed is : the filename/resourcename has not been indexed. I only see these fields: content_type:application/pdf: id:c385c455-5c1a-4284-937b-e88003fa3438#0: author:Neogi, Anindya: author_s:Neogi, Anindya: last_modified:1461302622000: _version_:1533216476522086400:
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Solr